pkuliyi2015 / multidiffusion-upscaler-for-automatic1111

Tiled Diffusion and VAE optimize, licensed under CC BY-NC-SA 4.0
Other
4.78k stars 337 forks source link

DemoFusion doesn't work (RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 126 but got size 36 for tensor number 1 in the list.) #362

Closed sebaxakerhtc closed 8 months ago

sebaxakerhtc commented 8 months ago

Tried different resolutions/checkpoints/samplers/vae but doesn't work at all Windows 11 • version: v1.8.0 • python: 3.10.11 • torch: 2.1.2+cu121 • gradio: 3.41.2

Looks like the problem somewhere in torch.Size

[Demo Fusion] ControlNet found, support is enabled.
BBOX: 6
DemoFusion hooked into 'DPM++ SDE' sampler, Tile size: 128, Tile count: 6, Batch size: 3, Tile batches: 2 (ext: ContrlNet)
### Phase 1 Denoising ###
  0%|                                                                                                                                                                                     | 0/30 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: ('task(fywqrzrp7ojher7)', <gradio.routes.Request object at 0x000002631118DC30>, 'photo of a woman wearing long red dress', '', ['sai-enhance'], 30, 'DPM++ SDE', 1, 4, 6, 1024, 768, False, 0.4, 1.7, 'DAT x2', 20, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, True, 'DemoFusion', True, 128, 64, 4, 2, False, 10, 1, 1, 64, False, True, 3, 1, 1, False, 3072, 192, True, True, True, False, False, 'x264', 'blend', 10, 0, 0, False, True, True, True, 'intermediate', 'animation', UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), None, False, '0', '0', 'inswapper_128.onnx', 'CodeFormer', 1, True, 'None', 1, 1, False, True, 1, 0, 0, False, 0.5, True, False, 'CUDA', False, 0, 'None', '', None, False, False, 0.5, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50, 'linear (weight sum)', '10', 'D:\\stable-diffusion-webui\\extensions\\stable-diffusion-webui-prompt-travel\\img\\ref_ctrlnet', 'mp4', 10.0, 0, '', True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, 'linear', 'lerp', 'token', 'random', '30', 'fixed', 1, '8', None, 'mp4', 10.0, 0, '', True, False, 0, 0, 0.0001, 75, 0) {}
    Traceback (most recent call last):
      File "D:\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "D:\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "D:\stable-diffusion-webui\modules\txt2img.py", line 110, in txt2img
        processed = processing.process_images(p)
      File "D:\stable-diffusion-webui\modules\processing.py", line 785, in process_images
        res = process_images_inner(p)
      File "D:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "D:\stable-diffusion-webui\modules\processing.py", line 921, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-upscaler-for-automatic1111\scripts\tileglobal.py", line 199, in <lambda>
        p.sample = lambda conditioning, unconditional_conditioning,seeds, subseeds, subseed_strength, prompts: self.sample_hijack(
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-upscaler-for-automatic1111\scripts\tileglobal.py", line 306, in sample_hijack
        p.latents = p.sampler.sample_img2img(p,p.latents, noise , conditioning, unconditional_conditioning, image_conditioning=p.image_conditioning)
      File "D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 553, in sample_dpmpp_sde
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_utils\utils.py", line 252, in wrapper
        return fn(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_methods\demofusion.py", line 220, in forward_one_step
        x_local = self.sampler.model_wrap_cfg.forward_ori(self.x_in_tmp_,sigma, **kwarg)
      File "D:\stable-diffusion-webui\modules\sd_samplers_cfg_denoiser.py", line 237, in forward
        x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_utils\utils.py", line 252, in wrapper
        return fn(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_methods\demofusion.py", line 297, in sample_one_step_local
        x_tile = torch.cat([x_in[bbox.slicer] for bbox in bboxes], dim=0)
    RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 128 but got size 39 for tensor number 1 in the list.

---
Jaylen-Lee commented 8 months ago

Do you use any other upscale algorithm? For images smaller than 1024, with tile size 128, phrase 1 denoising will get BBOX:1 rather than BBOX:6. And BBOX:6 will appear in phrase 2 denoising without other upscale algorithm

sebaxakerhtc commented 8 months ago

Do you use any other upscale algorithm?

No

For example: DemoFusion upscale x2 1) 768 x 1024 RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 128 but got size 35 for tensor number 1 in the list. 2) 832 x 1216 RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 128 but got size 108 for tensor number 1 in the list.

If I click generate again - the got size is changing randomly When I uncheck "Random jitter windows" - the got size is always the same.

DavideAlidosi commented 8 months ago

Same here, nVidia 3090, Windows 10, WebUI v1.8.0, python: 3.10.6, torch: 2.1.2+cu121, xformers: 0.0.23.post1, gradio: 3.41.2

BBOX: 2
DemoFusion hooked into 'DPM++ SDE Karras' sampler, Tile size: 128, Tile count: 2, Batch size: 2, Tile batches: 1 (ext: ContrlNet)
[Tiled VAE]: the input size is tiny and unnecessary to tile.
### Encoding Real Image ###
### Phase 1 Denoising ###
  0%|                                                                                            | 0/2 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: ('task(ovmyr0b2ko5x461)', 0, '<lora:nucleo:1>', '$(Negative Prompt)', [], <PIL.Image.Image image mode=RGBA size=960x1280 at 0x1A4B37ED840>, None, None, None, None, None, None, 5, 'DPM++ SDE Karras', 4, 0, 1, 1, 1, 6, 1.5, 0.3, 0.0, 512, 512, 1, 0, 0, 32, 0, '', '', '', [], False, [], '', <gradio.routes.Request object at 0x000001A4B37EEBF0>, 0, False, 1, 0.5, 4, 0, 0.5, 2, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 'MultiDiffusion', False, True, 1024, 1024, 192, 192, 96, 4, '8x_NMKD-Superscale_150000_G', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, True, 'DemoFusion', True, 128, 64, 4, 2, False, 10, 1, 1, 64, False, True, 3, 1, 1, True, 3072, 192, True, True, True, False, True, False, 1, False, False, False, 1.1, 1.5, 100, 0.7, False, False, True, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), True, False, False, 'ultima.safetensors [c818ee1217]', 'None', 3, '', {'save_settings': ['fp16', 'prune', 'safetensors'], 'calc_settings': ['GPU', 'fastrebasin']}, True, False, False, 'None', 'None', 'None', 'Sum', 'Sum', 'Sum', 0.5, 0.5, 0.5, True, True, True, [], [], [], [], [], [], '0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5', '0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5', '0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5', False, False, False, '', '', '', 'Normal', 'Normal', 'Normal', False, False, 'Matrix', 'Columns', 'Mask', 'Prompt', '1,1', '0.2', False, False, False, 'Attention', [False], '0', '0', '0.4', None, '0', '0', False, '* `CFG Scale` should be 2 or lower.', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, 'start', '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False, 5, 'all', 'all', 'all', '', '', '', '1', 'none', False, '', '', 'comma', '', True, '', '20', 'all', 'all', 'all', 'all', 0, '', True, '0', False, 'SDXL', 'Standard', None, None, False, None, None, False, None, None, False, 50, [], 30, '', 4, [], 1, '', '', '', '') {}
    Traceback (most recent call last):
      File "L:\WebUI\webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "L:\WebUI\webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "L:\WebUI\webui\modules\img2img.py", line 235, in img2img
        processed = process_images(p)
      File "L:\WebUI\webui\modules\processing.py", line 785, in process_images
        res = process_images_inner(p)
      File "L:\WebUI\webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "L:\WebUI\webui\modules\processing.py", line 921, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "L:\WebUI\webui\extensions\multidiffusion-upscaler-for-automatic1111\scripts\tileglobal.py", line 199, in <lambda>
        p.sample = lambda conditioning, unconditional_conditioning,seeds, subseeds, subseed_strength, prompts: self.sample_hijack(
      File "L:\WebUI\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "L:\WebUI\webui\extensions\multidiffusion-upscaler-for-automatic1111\scripts\tileglobal.py", line 306, in sample_hijack
        p.latents = p.sampler.sample_img2img(p,p.latents, noise , conditioning, unconditional_conditioning, image_conditioning=p.image_conditioning)
      File "L:\WebUI\webui\modules\sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "L:\WebUI\webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "L:\WebUI\webui\modules\sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "L:\WebUI\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "L:\WebUI\webui\repositories\k-diffusion\k_diffusion\sampling.py", line 553, in sample_dpmpp_sde
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "L:\WebUI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "L:\WebUI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "L:\WebUI\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "L:\WebUI\webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_utils\utils.py", line 252, in wrapper
        return fn(*args, **kwargs)
      File "L:\WebUI\webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_methods\demofusion.py", line 220, in forward_one_step
        x_local = self.sampler.model_wrap_cfg.forward_ori(self.x_in_tmp_,sigma, **kwarg)
      File "L:\WebUI\webui\modules\sd_samplers_cfg_denoiser.py", line 237, in forward
        x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
      File "L:\WebUI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "L:\WebUI\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "L:\WebUI\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "L:\WebUI\webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_utils\utils.py", line 252, in wrapper
        return fn(*args, **kwargs)
      File "L:\WebUI\webui\extensions\multidiffusion-upscaler-for-automatic1111\tile_methods\demofusion.py", line 297, in sample_one_step_local
        x_tile = torch.cat([x_in[bbox.slicer] for bbox in bboxes], dim=0)
    RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 87 but got size 38 for tensor number 1 in the list.
rushuna86 commented 8 months ago

so I found out that if you run Demofusion, with any other feature enabled it will cause this error. Demofusion on its own will work. But if you tick the noise inversion option or apply tile resample like you do with regular upscaling workflow, it will cause this error. And with Demofusion on it's own it is vastly inferior to Tile Diffusion + resample workflow.

Jaylen-Lee commented 8 months ago

I will try to fix this issue later and provide you with feedback. Thank you for your found very much.

rushuna86 commented 8 months ago

okay narrowed it down, pretty sure DemoFusion is a SDXL thing. All these errors are coming up because i'm trying to use it with SD 1.5 and also notice that the checkbox "keep image size" in Demofusion, if that is checked. Whole thing does nothing

Jaylen-Lee commented 8 months ago

I found that's a bug when using txt2img. And a new version will be merged soon, which including UI optimization ,some optional options and some guidelines. If necessary, you can visit my repo to get it before merged to this main repo. Thank you again for your attention.

sebaxakerhtc commented 8 months ago

I found that's a bug when using txt2img. And a new version will be merged soon, which including UI optimization ,some optional options and some guidelines. If necessary, you can visit my repo to get it before merged to this main repo. Thank you again for your attention.

Hi. Just tried your fork. At least it start to work, but on about 50% i get this

[Demo Fusion] ControlNet found, support is enabled.██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:15<00:00,  2.01it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:10<00:00,  2.78it/s]
BBOX: 1rogress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:10<00:00,  2.74it/s]
DemoFusion hooked into 'DPM++ SDE' sampler, Tile size: 128, Tile count: 1, Batch size: 1, Tile batches: 1 (ext: ContrlNet)
### Phase 2 Denoising ###
BBOX: 6
Tile size: 128, Tile count: 6, Batch size: 3, Tile batches: 2
 50%|██████████████████████████████████████████████████████████████████████████████████████                                                                                      | 15/30 [01:10<01:10,  4.70s/it]
*** Error completing request  4.57s/it]
*** Arguments: ('task(v4su36j0h59ikkg)', <gradio.routes.Request object at 0x00000224151B14E0>, 'a beautiful woman', '', [], 30, 'DPM++ SDE', 1, 1, 7, 1024, 768, False, 0.4, 1.7, 'DAT x2', 20, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, True, 'DemoFusion', True, 128, 64, 4, 3, False, 10, 1, 1, 64, False, True, 3, 1, 1, False, False, 3072, 192, True, True, True, False, False, 'x264', 'blend', 10, 0, 0, False, True, True, True, 'intermediate', 'animation', UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), None, False, '0', '0', 'inswapper_128.onnx', 'CodeFormer', 1, True, 'None', 1, 1, False, True, 1, 0, 0, False, 0.5, True, False, 'CUDA', False, 0, 'None', '', None, False, False, 0.5, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50, 'linear (weight sum)', '10', 'D:\\stable-diffusion-webui\\extensions\\stable-diffusion-webui-prompt-travel\\img\\ref_ctrlnet', 'mp4', 10.0, 0, '', True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, 'linear', 'lerp', 'token', 'random', '30', 'fixed', 1, '8', None, 'mp4', 10.0, 0, '', True, False, 0, 0, 0.0001, 75, 0) {}
    Traceback (most recent call last):
      File "D:\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "D:\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "D:\stable-diffusion-webui\modules\txt2img.py", line 110, in txt2img
        processed = processing.process_images(p)
      File "D:\stable-diffusion-webui\modules\processing.py", line 785, in process_images
        res = process_images_inner(p)
      File "D:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "D:\stable-diffusion-webui\modules\processing.py", line 921, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-demofusion-for-automatic1111\scripts\tileglobal.py", line 192, in <lambda>
        p.sample = lambda conditioning, unconditional_conditioning,seeds, subseeds, subseed_strength, prompts: self.sample_hijack(
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-demofusion-for-automatic1111\scripts\tileglobal.py", line 290, in sample_hijack
        p.latents = p.sampler.sample_img2img(p,p.latents, noise , conditioning, unconditional_conditioning, image_conditioning=p.image_conditioning)
      File "D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 573, in sample_dpmpp_sde
        denoised_2 = model(x_2, sigma_fn(s) * s_in, **extra_args)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-demofusion-for-automatic1111\tile_utils\utils.py", line 252, in wrapper
        return fn(*args, **kwargs)
      File "D:\stable-diffusion-webui\extensions\multidiffusion-demofusion-for-automatic1111\tile_methods\demofusion.py", line 188, in forward_one_step
        self.xi = self.p.x + self.p.noise * self.p.sigmas[self.p.current_step]
    IndexError: index 31 is out of bounds for dimension 0 with size 31
rushuna86 commented 8 months ago

what checkpoint/model you trying to use it with? i tried the fork it works fine with SDXL stuff but everything else is pretty much error

sebaxakerhtc commented 8 months ago

what checkpoint/model you trying to use it with? i tried the fork it works fine with SDXL stuff but everything else is pretty much error

Are you talking to me? If so - ping me. I tried many models SDXL and SD1.5 - the result is the same. I found that it works sometimes with "Euler a" 20-30 steps, but "DPM++ SDE" at 30 steps doesn't work.

Those that works have very very bad result. The picture is different from original, but as I read and see at DemoFusion it should be exatly the same picture! For now it works as another upscaler which have high denoise and changes the picture totally, destroys quality

3 Original vs demofusion comparsion

sebaxakerhtc commented 8 months ago

And if I set 1024 x 1024 i have OOM upscaling to 2048 x 2048...

For the test I installed ComfyUI custom node for DemoFusion and generated image 2112 x 3072 at 30 steps. There we have:

### Phase 1 Denoising ###
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:05<00:00,  5.13it/s]
### Phase 2 Denoising ###
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:10<00:00,  2.45s/it]### Phase 2 Decoding ###
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 48/48 [00:16<00:00,  2.90it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:31<00:00,  3.07s/it]████| 48/48 [00:16<00:00,  2.23it/s]
### Phase 3 Denoising ###
 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎             90%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▏         93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏     97%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [03:49<00:00,  7.77s/it]### Phase 3 Decoding ###
100%|████████████████████████████████████████████████████████████████████████████████| 108/108 [00:45<00:00,  2.40it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:39<00:00,  9.31s/it]
Prompt executed in 401.17 seconds

and the result is perfect!

Jaylen-Lee commented 8 months ago

Very useful feedback. In fact, due to the difference between the webui and the original demofusion library, there may be some differences in implementation that are difficult to erase. I will further compare and test on comfyUI. In addition, if it is an img2img task, it is recommended to use the original seed, prompt, and other parameters to maintain image stability.

Jaylen-Lee commented 8 months ago

Also, could you please let me know the graphics memory of your device? In my test, a 2048 * 2048 image can be obtained on a 16g 4060ti. If an OOM error occurs, you could enable tilevae, which allows you to obtain larger images without exceeding the memory. Tilevae was also implemented in the original demofusion. Thank you again for your feedback. I will investigate these problem as soon as possible, and inform you after the new version update.

sebaxakerhtc commented 8 months ago

@Jaylen-Lee I do that on my RTX 3090 24GB VRAM. That's why I was surprised! I can create 2048 x 2048 originally without Tiled VAE. But DemoFusion say OOM

Upd. "colab notebook locally" and "comfyUI" works like magic even with 3072 x 3072 I'm pretty sure it will work with 4096 x 4096, but not tried yet

sebaxakerhtc commented 8 months ago

@Jaylen-Lee 1) Looks like the problem no1 is using the sampler different from "Euler a". When I set "Euler a" - the result is much better. (maybe it should be hardcoded to use Euler a. In original and ComfyUI we can't choose the sampler) But the picture is still very blurry. (maybe you can check the code about Gaussian part?) You know, in original we have phase 1 denoising and decoding, next phase again denoising and decoding. Using yours demofusion I don't see it. 2) c1, c2 and c3 should be maximum 5, not 3 (as in original demofusion). And it's a good idea to add sigma and option to save the image before upscaling (for comparing) to settings in UI. The settings (min, max, step, default) needed in UI

          guidance_scale = gr.Slider(minimum=1, maximum=20, step=0.1, value=7.5, label="Guidance Scale")
          cosine_scale_1 = gr.Slider(minimum=0, maximum=5, step=0.1, value=3, label="Cosine Scale 1")
          cosine_scale_2 = gr.Slider(minimum=0, maximum=5, step=0.1, value=1, label="Cosine Scale 2")
          cosine_scale_3 = gr.Slider(minimum=0, maximum=5, step=0.1, value=1, label="Cosine Scale 3")
          sigma = gr.Slider(minimum=0.1, maximum=1, step=0.1, value=0.8, label="Sigma")
          view_batch_size = gr.Slider(minimum=4, maximum=32, step=4, value=4, label="View Batch Size")
          stride = gr.Slider(minimum=8, maximum=96, step=8, value=64, label="Stride")

3) I still can't generate 1024 x 1024 with x2 upscale. I have OOM at decoding (I think... Why we can't see decoding progress?), after denoising. I hope the log will help you:

[Demo Fusion] ControlNet found, support is enabled.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:09<00:00,  3.26it/s]
BBOX: 1rogress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:08<00:00,  3.23it/s]
DemoFusion hooked into 'Euler a' sampler, Tile size: 256, Tile count: 1, Batch size: 1, Tile batches: 1 (ext: ContrlNet)
### Phase 2 Denoising ###
BBOX: 1
Tile size: 256, Tile count: 1, Batch size: 1, Tile batches: 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:23<00:00,  2.80s/it]
*** Error completing request  2.90s/it]
*** Arguments: ('task(smlnjdgt4tfityo)', <gradio.routes.Request object at 0x000001458CF5E0E0>, 'Envision a portrait of an elderly woman, her face a canvas of time, framed by a headscarf with muted tones of rust and cream. Her eyes, blue like faded denim. Her attire, simple yet dignified.', 'blurry, ugly, duplicate, poorly drawn, deformed, mosaic', [], 30, 'Euler a', 1, 1, 7.5, 1024, 1024, False, 0.4, 1.7, 'DAT x2', 20, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, True, 'DemoFusion', True, 256, 64, 4, 2, False, 10, 1, 1, 64, False, True, 3, 1, 1, False, False, 3072, 192, True, True, True, False, False, 'x264', 'blend', 10, 0, 0, False, True, True, True, 'intermediate', 'animation', UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), None, False, '0', '0', 'inswapper_128.onnx', 'CodeFormer', 1, True, 'None', 1, 1, False, True, 1, 0, 0, False, 0.5, True, False, 'CUDA', False, 0, 'None', '', None, False, False, 0.5, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50, 'linear (weight sum)', '10', 'D:\\stable-diffusion-webui\\extensions\\stable-diffusion-webui-prompt-travel\\img\\ref_ctrlnet', 'mp4', 10.0, 0, '', True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, 'linear', 'lerp', 'token', 'random', '30', 'fixed', 1, '8', None, 'mp4', 10.0, 0, '', True, False, 0, 0, 0.0001, 75, 0) {}
    Traceback (most recent call last):
      File "D:\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "D:\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "D:\stable-diffusion-webui\modules\txt2img.py", line 110, in txt2img
        processed = processing.process_images(p)
      File "D:\stable-diffusion-webui\modules\processing.py", line 785, in process_images
        res = process_images_inner(p)
      File "D:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "D:\stable-diffusion-webui\modules\processing.py", line 933, in process_images_inner
        x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
      File "D:\stable-diffusion-webui\modules\processing.py", line 632, in decode_latent_batch
        sample = decode_first_stage(model, batch[i:i + 1])[0]
      File "D:\stable-diffusion-webui\modules\sd_samplers_common.py", line 76, in decode_first_stage
        return samples_to_images_tensor(x, approx_index, model)
      File "D:\stable-diffusion-webui\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
        x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\stable-diffusion-webui\repositories\generative-models\sgm\models\diffusion.py", line 121, in decode_first_stage
        out = self.first_stage_model.decode(z)
      File "D:\stable-diffusion-webui\repositories\generative-models\sgm\models\autoencoder.py", line 315, in decode
        dec = self.decoder(z, **decoder_kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 722, in forward
        h = self.mid.attn_1(h, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 649, in sdp_attnblock_forward
        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False)
    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB. GPU 0 has a total capacty of 24.00 GiB of which 0 bytes is free. Of the allocated memory 23.49 GiB is allocated by PyTorch, and 470.54 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But, yeah, I have better results! I think the main my problem was using different samplers.

Upd. And sorry - I forgot the main thing - BIG THANK YOU!!! for demofusion implementation in sdwebui! For now it's better than even original and ComfyUI because we can set custom resolutions! The best results I have on custom models, not original SDXL base - that\s strange. On original images are veeeery blurry

sebaxakerhtc commented 8 months ago

I think the problem of blurry images is somewhere in the cosines code.

I'm looking at the difference between original code and your implementation.

                    cosine_factor = 0.5 * (1 + torch.cos(torch.pi * (self.scheduler.config.num_train_timesteps - t) / self.scheduler.config.num_train_timesteps)).cpu()

                    c1 = cosine_factor ** cosine_scale_1
                    latents = latents * (1 - c1) + noise_latents[i] * c1

and yours

        self.cosine_factor = 0.5 * (1 + torch.cos(torch.pi *torch.tensor(((self.p.current_step + 1) / (self.t_enc+1)))))
        c2 = self.cosine_factor**self.p.cosine_scale_2

        self.c1 = self.cosine_factor ** self.p.cosine_scale_1

I'm not powerful at python here, but maybe it will help you

Jaylen-Lee commented 8 months ago

@sebaxakerhtc Thank you for providing more information.

I can use SDXL to double the image size of 1024 1024 without using tilevae on a 4060ti 16g device without OOM errors. I noticed that you are using a tile size of 256, which means processing a tile of 256 8, namely a 4096 * 4096 tile, which may cause OOM. For the SDXL model, I suggest using a tile size of 128. Tile size lower than 128 (corresponding to 1024 images) can also cause images quality problem in SDXL model.

Note that Keep input image size will cause the input image size to overwrite the image size set in img2img, this maybe cause OOM problem sometime. I will replace the default value here with false. And thank you for discovering issues with other parameters (Although I usually don't adjust c1, c2, c3 haha)

About blurry image, I suggest setting the denoising strength to 0.8-1. And I think that's actually how it's done in demofusion. low-level denoising strength will cause blurry image. So to get a image similar with original one also requires your prompt to be close to the original prompt, same random seed maybe better.

About decoding process, I will consider how to output multiple images of different sizes. It was because I hadn't figured it out yet that I cancelled the decoding process for each stage. But don't worry, this won't affect the final image.

About the issue with the samplers, I have noticed the special structure of the DPM++ series sampler. This bug has been fixed and will soon be updated on my fork. And the Euler sampler do seem to have better results (which is the default config in official demofusion, better than Euler a or others, I dont know why...).

In addition, I will update my work on my fork temporarily and choose the appropriate time to merge it into this repo.

Jaylen-Lee commented 8 months ago

Not only SDXL, my attempts on 1.5 model can also achieve good results with Euler

txt2img original, 20 steps, 512*512 1 txt2img with x2 2

Jaylen-Lee commented 8 months ago

another example original txt2img 1 with x2 2

sebaxakerhtc commented 8 months ago

I can use SDXL to double the image size of 1024 * 1024 without using tilevae on a 4060ti 16g device without OOM errors

I'm sure you have something like --lowvram or --medvram

you are using a tile size of 256

That was just in this example. With 128 the result is the same.

Keep input image size will cause the input image size to overwrite the image size set in img2img

I'm not using img2img - only txt2img. For img2img we have many other, even better solurions.

I cancelled the decoding process for each stage

@Jaylen-Lee Please, provide a solution to return it back, because for me it is the main problem of blurry and different images! The whole idea of demofusion is to generate all the same image, but with bigger resolution. For now we have "just another upscaler", but not demofusion. Quote: "DemoFusion: Democratising High-Resolution Image Generation With No $$$" Generation, not upscale img2img.

изображение

sebaxakerhtc commented 8 months ago

And what we have now?

00058-2013 00059-2013

Another example

1 2

Jaylen-Lee commented 8 months ago

Just try the latest updates on my fork! And now it supports obtaining multi-stage images in the final output. The code has been updated, and readme will be improved later.

About blurry images, I think this is due to the different implementation methods of k-diffusion(Webui) and diffuser(original demofusion), making it difficult to replicate demofusion, which means different parameter setting. Setting higher C3 and lower(than 0.8, maybe 0.4) sigma can effectively alleviate this problem. Of course, Sigma can be adjusted now. And by further adjusting the algorithm, I don't think using the same parameters as the original demofusin would have a big problem.

But unfortunately, I am still not sure why your device has an OOM error. I launch the webui by python launch.py and use sdxl_base_1.0 model. One possible point to note is that the original demofusion uses multi decode by default, corresponding to tilevae. And I think using tilevae can avoid OOM issues in many situations and get bigger images.

Thanks and welcome more bugs or advice!

Latest instance, ori(1024*1024) left and x2 right image

sebaxakerhtc commented 8 months ago

original demofusion uses multi decode by default

I disabled that for 2048x2048. Anyway I already created a tab for WebUI with original DemoFusion - I like to use DF, but I need Automatic1111 too. There's no reason for 2 instances. So... This can be closed now because the main problem of issue was solved :)

sebaxakerhtc commented 8 months ago

Just try the latest updates on my fork

Did you tried non-square images? All the settings by defaults изображение

sebaxakerhtc commented 8 months ago

@Jaylen-Lee Hi! I really want to help you with your implementation of DemoFusion!

Today I played with different schedulers (samplers) in original DemoFusion and the most similar are "DDIM" and "LMS", not "Euler" or even "Euler a". Both "Eulers" gives very blurry images!

I installed your "working" commit and tested with LMS and PLMS - the result was much much better!!!

I don't know if we have the same sampler as default in Diffusers - I can't found this info, but play around DDIM, PLMS and LMS!

Good luck!

Original on the left and LMS on the right. 123

Yes, little blurry than original, but if you will see Euler a, or even worther Euler.... Same settings, same seed, all the same

Upd. DDIM doesn't work with DemoFusion

sebaxakerhtc commented 8 months ago

@Jaylen-Lee As I said before - something wrong with cosines in your implementation. Setting C2 to 1 (default in original project) adds many objects to image. C1 gives very different result at 3 (default in original project) About Gaussian Filter... It's C3 setting. Why we have the C3 setting if C3 is disabled? Please, review your code and make it BETTER for all of us! But after reading the original documentation

5

we can play with the settings and get this result: 1024 x 1024 on the left and 2048 x 2048 on the right

6

Not bad! Isn't it? And this is your code! Not original!

Commit: https://github.com/Jaylen-Lee/tilediffusion-demofusion-for-automatic1111-webui/tree/8b0763a228ef0fd6d076d3f2024130fe613c5753

Settings and other info:

4

3

Best regards!

sebaxakerhtc commented 8 months ago

@Jaylen-Lee
Original project for comparsion

Orig

Original project settings

settings

sebaxakerhtc commented 7 months ago

And the Euler sampler do seem to have better results (which is the default config in official demofusion

Yeah, - you were right! It's Euler. How is your work on DemoFusion? Latest fork was at 25 s/it at 2nd phase and I terminated it...