lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
41.02k stars 5.76k forks source link

CUDA error: an illegal memory access was encountered on 3060 Ti #1412

Closed jyuyttl closed 10 months ago

jyuyttl commented 10 months ago

I have RTX 3060 Ti 8GB VRAM and I am getting and the error "an illegal memory access was encountered". Sometimes it works for a while and then I get the error and sometimes I get it from the first try. It happens no matter the settings and even with the most basic prompts. Even with the same prompt that initially worked, it stops working after a few tries.

C:\Users\saniel\Desktop\Fooocus_win64_2-1-791>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --preset realistic Fast-forward merge Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py', '--preset', 'realistic'] Loaded preset: C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\presets\realistic.json Failed to load config key: {"path_checkpoints": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\checkpoints"} is invalid or does not exist; will use {"path_checkpoints": "../models/checkpoints/"} instead. Failed to load config key: {"path_loras": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\loras"} is invalid or does not exist; will use {"path_loras": "../models/loras/"} instead. Failed to load config key: {"path_embeddings": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\embeddings"} is invalid or does not exist; will use {"path_embeddings": "../models/embeddings/"} instead. Failed to load config key: {"path_vae_approx": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\vae_approx"} is invalid or does not exist; will use {"path_vae_approx": "../models/vae_approx/"} instead. Failed to load config key: {"path_upscale_models": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\upscale_models"} is invalid or does not exist; will use {"path_upscale_models": "../models/upscale_models/"} instead. Failed to load config key: {"path_inpaint": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\inpaint"} is invalid or does not exist; will use {"path_inpaint": "../models/inpaint/"} instead. Failed to load config key: {"path_controlnet": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\controlnet"} is invalid or does not exist; will use {"path_controlnet": "../models/controlnet/"} instead. Failed to load config key: {"path_clip_vision": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\clip_vision"} is invalid or does not exist; will use {"path_clip_vision": "../models/clip_vision/"} instead. Failed to load config key: {"path_fooocus_expansion": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\models\prompt_expansion\fooocus_expansion"} is invalid or does not exist; will use {"path_fooocus_expansion": "../models/prompt_expansion/fooocus_expansion"} instead. Failed to load config key: {"path_outputs": "C:\Users\saniel\Downloads\Fooocus_win64_2-1-791\Fooocus\outputs"} is invalid or does not exist; will use {"path_outputs": "../outputs/"} instead. Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.1.839 Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch(). Total VRAM 8192 MB, total RAM 16251 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 NVIDIA GeForce RTX 3060 Ti : native VAE dtype: torch.bfloat16 Using pytorch cross attention Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'} left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids']) Base model loaded: C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\models\checkpoints\realisticStockPhoto_v10.safetensors Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\models\checkpoints\realisticStockPhoto_v10.safetensors]. Loaded LoRA [C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\models\checkpoints\realisticStockPhoto_v10.safetensors] with 788 keys at weight 0.25. Loaded LoRA [C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\models\checkpoints\realisticStockPhoto_v10.safetensors] with 264 keys at weight 0.25. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 1.37 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 3.0 [Parameters] Seed = 8670350756642913313 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 60 - 30 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] girl in a forest, full color, highly detailed, cinematic, complex, magical, sharp focus, dramatic, thought, beautiful, innocent, enchanted, magic, complete, pretty, background light, amazing, perfect, elegant, delicate, epic, composition, colorful, illuminated, stunning, symmetry, great, dynamic, expressive, cute, best, new, fantastic, vibrant [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] girl in a forest, light, magic, sharp focus, intricate, cinematic, elegant, highly detailed, extremely new, color, epic, romantic, scenic, artistic,, surreal, beautiful,, deep colors, inspired, rich vivid, ambient, novel, atmosphere, glowing, vibrant, symmetry, focused, perfect, coherent, best, expressive, cute, great composition [Fooocus] Encoding positive #1 ... [Fooocus Model Management] Moving model(s) has taken 0.16 seconds [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (1152, 896) Preparation time: 4.93 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 9.00 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 60/60 [00:43<00:00, 1.37it/s] Requested to load AutoencoderKL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 0.16 seconds Image generated with private log at: C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\outputs\2023-12-14\log.html Generating and saving time: 54.51 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 1.39 seconds 78%|████████████████████████████████████████████████████████████████▏ | 47/60 [00:33<00:09, 1.40it/s] Traceback (most recent call last): File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 803, in worker handler(task) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 735, in handler imgs = pipeline.process_diffusion( File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion sampled_latent = core.ksampler( File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\core.py", line 313, in ksampler samples = ldm_patched.modules.sample.sample(model, File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\sample.py", line 100, in sample samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\samplers.py", line 715, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\sample_hijack.py", line 157, in sample_hacked samples = sampler.sample(model_wrap, sigmas, extra_args, callback_wrap, noise, latent_image, denoise_mask, disable_pbar) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\samplers.py", line 560, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, self.extra_options) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\k_diffusion\sampling.py", line 701, in sample_dpmpp_2m_sde_gpu return sample_dpmpp_2m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, solver_type=solver_type) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\k_diffusion\sampling.py", line 613, in sample_dpmpp_2m_sde denoised = model(x, sigmas[i] * s_in, *extra_args) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 348, in patched_KSamplerX0Inpaint_forward out = self.inner_model(x, sigma, File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\samplers.py", line 274, in forward return self.apply_model(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\samplers.py", line 271, in apply_model out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 222, in patched_sampling_function positive_x0, negative_x0 = calc_cond_uncond_batch(model, cond, uncond, x, timestep, model_options) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\samplers.py", line 226, in calc_cond_uncond_batch output = model.apply_model(inputx, timestep, c).chunk(batch_chunks) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\model_base.py", line 85, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, extra_conds).float() File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 465, in patched_unet_forward h = forward_timestep_embed(module, h, emb, context, transformer_options, output_shape, time_context=time_context, num_video_frames=num_video_frames, image_only_indicator=image_only_indicator) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\ldm\modules\diffusionmodules\openaimodel.py", line 46, in forward_timestep_embed x = layer(x, context, transformer_options) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\ldm\modules\attention.py", line 606, in forward x = block(x, context=context[i], transformer_options=transformer_options) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\ldm\modules\attention.py", line 433, in forward return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\ldm\modules\diffusionmodules\util.py", line 189, in checkpoint return func(inputs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\ldm\modules\attention.py", line 493, in _forward n = self.attn1(n, context=context_attn1, value=value_attn1) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\ldm\modules\attention.py", line 382, in forward v = self.to_v(context) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP) Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models Exception in thread Thread-2 (worker): Traceback (most recent call last): File "threading.py", line 1016, in _bootstrap_inner File "threading.py", line 953, in run File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 809, in worker pipeline.prepare_text_encoder(async_call=True) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\default_pipeline.py", line 211, in prepare_text_encoder ldm_patched.modules.model_management.load_models_gpu([final_clip.patcher, final_expansion.patcher]) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 475, in patched_load_models_gpu y = ldm_patched.modules.model_management.load_models_gpu_origin(args, kwargs) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\model_management.py", line 388, in load_models_gpu free_memory(total_memory_required[device] * 1.3 + extra_mem, device, models_already_loaded) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\model_management.py", line 340, in free_memory m.model_unload() File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\model_management.py", line 311, in model_unload self.model.unpatch_model(self.model.offload_device) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\Fooocus\ldm_patched\modules\model_patcher.py", line 349, in unpatch_model self.model.to(device_to) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1160, in to return self._apply(convert) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 810, in _apply module._apply(fn) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 833, in _apply param_applied = fn(param) File "C:\Users\saniel\Desktop\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

lllyasviel commented 10 months ago

Hi we have updated the troubleshoot for this problem. The info is

CUDA kernel errors might be asynchronously reported at some other API call

A very small amount of devices does have this problem. The cause can be complicated but usually can be resolved after following these steps:

  1. Make sure that you are using official version and latest version installed from here. (Some forks and other versions are more likely to cause this problem.)
  2. Upgrade your Nvidia driver to the latest version. (Usually the version of your Nvidia driver should be 53X, not 3XX or 4XX.)
  3. If things still do not work, then perhaps it is a problem with CUDA 12. You can use CUDA 11 and Xformers to try to solve this problem. We have prepared all files for you, and please do NOT install any CUDA or other environment on you own. The only one official way to do this is: (1) Backup and delete your python_embeded folder (near the run.bat); (2) Download the "previous_old_xformers_env.7z" from the release page, decompress it, and put the newly extracted python_embeded folder near your run.bat; (3) run Fooocus.
  4. If it still does not work, please open an issue for us to take a look.

See also

See also https://github.com/lllyasviel/Fooocus/blob/main/troubleshoot.md#cuda-kernel-errors-might-be-asynchronously-reported-at-some-other-api-call