ljleb / sd-webui-freeu

a1111 implementation of https://github.com/ChenyangSi/FreeU
MIT License
305 stars 16 forks source link

Getting cuFFT error: CUFFT_INTERNAL_ERROR error #45

Closed Dwanvea closed 7 months ago

Dwanvea commented 7 months ago

I recently started using zluda on automatic1111 and this extension prevents me from generating images and gives this error: " cuFFT error: CUFFT_INTERNAL_ERROR " . It works fine when I switch back to using directml. Any idea how can I fix it? I really like this extension but I don't want to go back to directml because it's slower on my system.

ljleb commented 7 months ago

Is there a full stacktrace? If you have any info that can help locating the error I would appreciate it.

My guess is this is related to the fft code? It checks if directml is available and does the fft on cpu in that case. Maybe we need to add a check for zluda too when directml is not available?

Dwanvea commented 7 months ago
*** Error completing request
*** Arguments: ('task(nix4vwslv9r3w4l)', <gradio.routes.Request object at 0x000002129A6077C0>, 'a portrait of a mayan quetzalcoatl goddess with a lazer shining into the top of her head, pieces expanding from impact aquamarine and red, by android jones, by ben ridgeway, by ross draws, by Noah Bradley, by Maciej Kuciara + illustrative + visionary art + low angle + oil painting + Visionary art, DMT, psychedelic, The god particle, utopia profile, artgerm, featured in artstation, elegant, Moebius, Greg rutkowski', '((human ears)), (((visible ears))), bad-artist, bad-hands-5, an11:0.3, (EasyNegativeV2:1.3), (easynegative:1.7)', [], 35, 'Euler a', 1, 1, 7, 768, 512, False, 0.4, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', ['Downcast alphas_cumprod: True', 'Pad conds: True'], 0, False, '', 0.8, 1951411549, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 'DemoFusion', False, 128, 64, 4, 2, False, 10, 1, 1, 64, False, True, 3, 1, 1, True, 0.85, 0.6, 4, False, False, 512, 64, True, True, True, False, False, 7, 100, 'Constant', 0, 'Constant', 0, 4, True, 'MEAN', 'AD', 1, <scripts.animatediff_ui.AnimateDiffProcess object at 0x000002129A606E30>, UiControlNetUnit(enabled=False, module='tile_resample', model='control_v11f1e_sd15_tile [a371b31b]', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='ControlNet is more important', inpaint_crop_input_image=True, hr_option='HiResFixOption.BOTH', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), True, 0, 1, 0, 'Version 2', 1.2, 0.9, 0, 0.5, 0, 1, 1.4, 0.2, 0, 0.5, 0, 1, 1, 1, 0, 0.5, 0, 1, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50) {}
    Traceback (most recent call last):
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\txt2img.py", line 110, in txt2img
        processed = processing.process_images(p)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\processing.py", line 787, in process_images
        res = process_images_inner(p)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\processing.py", line 1015, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\processing.py", line 1351, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 239, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 239, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_samplers_cfg_denoiser.py", line 237, in forward
        x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 18, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 30, in __call__
        return self.__sub_func(self.__orig_func, *args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_hijack_unet.py", line 48, in apply_model
        return orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).float()
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
        x_recon = self.model(x_noisy, t, **cond)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1561, in _call_impl
        result = forward_call(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
        out = self.diffusion_model(x, t, context=cc)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\modules\sd_unet.py", line 91, in UNetModel_forward
        return original_forward(self, x, timesteps, context, *args, **kwargs)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 801, in forward
        h = th.cat([h, hs.pop()], dim=1)
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\extensions\sd-webui-freeu\lib_free_u\unet.py", line 67, in free_u_cat_hijack
        h_skip = filter_skip(
      File "E:\Stable Diffusion\stable-diffusion-webui-directml\extensions\sd-webui-freeu\lib_free_u\unet.py", line 99, in filter_skip
        x_freq = torch.fft.fftn(x.to(fft_device).float(), dim=(-2, -1))
    RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

Here it is.

My guess is this is related to the fft code? It checks if directml is available and does the fft on cpu in that case. Maybe we need to add a check for zluda too when directml is not available?

I don't know python or how any of this works honestly, I'm a casual SD user mostly, but being an AMD user taught me some some stuff lol. I was tinkering around with chat gpt and it was suggesting a similar solution. So yes, that could be it.

GPT had suggested I use the below code to fix the problem

 import torch

def fft_on_device(x):
    # Check if DirectML is available
    if torch.backends.directml.is_available():
        return torch.fft.fftn(x.to('directml').float(), dim=(-2, -1))
    # Check if ZLUDA is available
    elif torch.backends.zluda.is_available():
        return torch.fft.fftn(x.to('zluda').float(), dim=(-2, -1))
    # Fallback to CPU
    else:
        return torch.fft.fftn(x.to('cpu').float(), dim=(-2, -1))

Edit : Edited some of my text for better clarity and Stacktrace was kind of wrong, so I fixed it.

ljleb commented 7 months ago

*** Error completing request

Thanks, that confirms my suspicion.

GPT had suggested I use the below code to fix the problem

While this could work, the API is not correct for AMD I think, so it might as well be wrong for zluda. I don't have any way of testing this code as I do not have an AMD card.

I think we can run a light fft test at startup and if it raises an exception, disable the fft on the gpu. If you change cards without closing the process it might stop working, but I'm sure it's not the only thing that breaks in this case.

Dwanvea commented 7 months ago

I want to try it. What do I need to do?

ljleb commented 7 months ago

You can update the repo to the latest version. (I just pushed a fix to the main branch) Let me know if it still doesn't work, I will reopen this issue in that case.

Dwanvea commented 7 months ago

It works!! Thank you.