Closed AFOLcast closed 10 months ago
Same error as in https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2144, where one of the solutions was to do exactly what your error has output:
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Can you please add this to your startup command (either directly or in run.bat) and check again?
CUDA_LAUNCH_BLOCKING=1 .\python_embeded\python.exe -s Fooocus\entry_with_update.py
Thank you! I will try right away. It's unfortunate that I'm just technical enuff to screw things up... not understand installations too well .
May I ask what this command does?
J
On Thu, Dec 28, 2023, 5:16 AM Manuel Schmid @.***> wrote:
Same error as in AUTOMATIC1111/stable-diffusion-webui#2144 https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2144, where one of the solutions was to do exactly what your error has output:
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Can you please add this to your startup command (either directly or in run.bat) and check again?
CUDA_LAUNCH_BLOCKING=1 .\python_embeded\python.exe -s Fooocus\entry_with_update.py
— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/Fooocus/issues/1621#issuecomment-1871025300, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB4LFNPL6ZHNH5M4UUNHHPDYLVBHPAVCNFSM6AAAAABBE4N2RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZRGAZDKMZQGA . You are receiving this because you authored the thread.Message ID: @.***>
Sure, happy to explain it to you.
As of your console log you start Fooocus by executing this line (either manually or via run.bat) in D:\Fooocus
:
.\python_embeded\python.exe -s Fooocus\entry_with_update.py
My proposal is to just prefix it with CUDA_LAUNCH_BLOCKING=1
as suggested by the transformers package (origin of the error you've provided) for further debugging and analysis. This may even solve your issue completely, but let's test.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
To do so, you can either directly execute mentioned command in D:\Fooocus
or adjust the existing line in your run.bat file.
Hope this explanation helped to understand what this does.
Did as you suggested. Maybe too literally. Got this error message.
D:\Fooocus>CUDA_LAUNCH_BLOCKING=1 .\python_embeded\python.exe -s Fooocus\entry_with_update.py 'CUDA_LAUNCH_BLOCKING' is not recognized as an internal or external command, operable program or batch file.
D:\Fooocus>pause Press any key to continue . . .
Trying it as two statements:
D:\Fooocus>set CUDA_LAUNCH_BLOCKING=1
D:\Fooocus> .\python_embeded\python.exe -s Fooocus\entry_with_update.py Already up-to-date
Yeah, the option i mentioned is for Linux, sry. Content of my run.bat file:
set CUDA_LAUNCH_BLOCKING=1
.\python_embeded\python.exe -s Fooocus\entry_with_update.py <args here>
pause
It's running now. Won't know for a little while whether it will bomb out or not. Even with Afterburner, slow. But I do great work with Fooocus, so I'm REALLY trying to make this happen,
Failed. Here's the console:
Microsoft Windows [Version 10.0.22631.2861] (c) Microsoft Corporation. All rights reserved.
D:\Fooocus>CUDA_LAUNCH_BLOCKING=1 'CUDA_LAUNCH_BLOCKING' is not recognized as an internal or external command, operable program or batch file.
D:\Fooocus>set CUDA_LAUNCH_BLOCKING=1
D:\Fooocus> .\python_embeded\python.exe -s Fooocus\entry_with_update.py Already up-to-date Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py'] Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.1.855 Running on local URL: http://127.0.0.1:7865
To create a public link, set share=True
in launch()
.
Total VRAM 6144 MB, total RAM 16200 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 NVIDIA GeForce RTX 2060 : native
VAE dtype: torch.float32
Using pytorch cross attention
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [D:\Fooocus\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.92 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8000631531285694637
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] cute puppy, fine detail, intricate, elegant, dynamic, vibrant color, highly detailed, symmetry, sharp focus, beautiful, divine, professional, ambient light, cute, magical, vivid, artistic, true magic, pure, full background, dramatic, shining, epic, great composition, cinematic, winning, perfect, rational, scenic, lively, novel, atmosphere, best
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] cute puppy, intricate, elegant, highly detailed, wonderful colors, sweet, sharp focus, symmetry, fine detail, colorful, professional, extremely luxury, stunning, enhanced quality, very inspirational, color, winning, epic, cinematic, amazing, creative, beautiful, pure, attractive, cute, best, light, hopeful, thought, iconic, clear, perfect, luxurious
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.28 seconds
Traceback (most recent call last):
File "D:\Fooocus\Fooocus\modules\async_worker.py", line 806, in worker
handler(task)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "D:\Fooocus\Fooocus\modules\async_worker.py", line 415, in handler
t['c'] = pipeline.clip_encode(texts=t['positive'], pool_top_k=t['positive_top_k'])
File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "D:\Fooocus\Fooocus\modules\default_pipeline.py", line 190, in clip_encode
cond, pooled = clip_encode_single(final_clip, text)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "D:\Fooocus\Fooocus\modules\default_pipeline.py", line 148, in clip_encode_single
result = clip.encode_from_tokens(tokens, return_pooled=True)
File "D:\Fooocus\Fooocus\ldm_patched\modules\sd.py", line 131, in encode_from_tokens
cond, pooled = self.cond_stage_model.encode_token_weights(tokens)
File "D:\Fooocus\Fooocus\ldm_patched\modules\sdxl_clip.py", line 54, in encode_token_weights
g_out, g_pooled = self.clip_g.encode_token_weights(token_weight_pairs_g)
File "D:\Fooocus\Fooocus\modules\patch_clip.py", line 57, in patched_encode_token_weights
out, pooled = self.encode(to_encode)
File "D:\Fooocus\Fooocus\ldm_patched\modules\sd1_clip.py", line 191, in encode
return self(tokens)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "D:\Fooocus\Fooocus\modules\patch_clip.py", line 143, in patched_SDClipModel_forward
outputs = self.transformer(input_ids=tokens, attention_mask=attention_mask,
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 822, in forward
return self.text_model(
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 740, in forward
encoder_outputs = self.encoder(
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 654, in forward
layer_outputs = encoder_layer(
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 393, in forward
hidden_states = self.mlp(hidden_states)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 350, in forward
hidden_states = self.fc2(hidden_states)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "D:\Fooocus\Fooocus\ldm_patched\modules\ops.py", line 45, in forward
return torch.nn.functional.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
Total time: 732.56 seconds
This could be a problem with an outdated CUDA version as you don't seem to be using the one-click-installer files (run.bat etc.). Which CUDA (11.8 / 12.1 / X) and pytorch version are you using?
Yes. I am using the run.bat files. I didn't know to use "set" the first time. How do I check Cuda & pytorch version? I simply did a clean install of the most recent version. I'm using the most recent Nvidia driver as well. Hmmm. Photoshop just crapped out saying my gpu is not current. Could that have happened from the set CUDA_LAUNCH_BLOCKING=1 command? Gonna restart. Things are getting wonky,
For me, this can be checked in the folder Fooocus\python_embeded\Lib\site-packages. There should be a folder torch and one folder below another one named torch-2.1.0+cu121.dist-info (torch 2.1.0 & CUDA 12.1). If this does not exist, you might have another version installed and the folder might be named differently.
Indeed. This is what I found: torch-2.1.0+cu121.dist-info
Sorry, i sadly don't have a direct solution to this, maybe somebody else has additional input.
Do you have any idea id setting low vram might affect this? Or how to accomplish that?
You can certainly try to set --always-low-vram
and run it again, but i doubt that this will help. Let's give it a shot!
Ok. Now working. I was always bad at the scientific method. I never test one thing at a time.
I reinstalled the nvidia driver. I checked all my cuda settings in Nvidia. Some of them had changed that I wasn't aware of. Perhaps from a recent update. I made sure python.exe was set only for Nvidia GPU. I tried first with the "new" 12 xformers. Bombed out. Tried with the "old" 11 x formers. Worked like a champ.
Now I had done ALL this and more in the last several weeks. Never worked before. But now working with the latests version, .855, and the "old" cuda 11 xformers.
Couldn't be happier.
Marking closed.
Read Troubleshoot
[x] I admit that I have read the Troubleshoot before making this issue.
Describe the problem Started a clean re-install. Followed all troubleshooting. Swap memory is at 44000-60000. Tried with and without old xformers. Most recent run with new xformers. Hangs.
Full Console Log
D:\Fooocus>.\python_embeded\python.exe -s Fooocus\entry_with_update.py Already up-to-date Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py'] Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.1.855 Running on local URL: http://127.0.0.1:7865
To create a public link, set
share=True
inlaunch()
. Total VRAM 6144 MB, total RAM 16200 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'} Base model loaded: D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors]. Loaded LoRA [D:\Fooocus\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.72 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 8128164886135262337 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] cute puppy, fine intricate, elegant, highly detailed, symmetry, sharp focus, majestic, amazing bright colors, radiant light, vivid color, coherent, dazzling, brilliant, colorful, very scientific background, professional, winning, open artistic, deep aesthetic, magical, scenic, thought complex, extremely cool, creative, cinematic, singular, best, real, imagined, dramatic [Fooocus] Preparing Fooocus text #2 ... Traceback (most recent call last): File "D:\Fooocus\Fooocus\modules\async_worker.py", line 806, in worker handler(task) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Fooocus\Fooocus\modules\async_worker.py", line 408, in handler expansion = pipeline.final_expansion(t['task_prompt'], t['task_seed']) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\Fooocus\extras\expansion.py", line 117, in call features = self.model.generate(tokenized_kwargs, File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\generation\utils.py", line 1572, in generate return self.sample( File "D:\Fooocus\python_embeded\lib\site-packages\transformers\generation\utils.py", line 2619, in sample outputs = self( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 1080, in forward transformer_outputs = self.transformer( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 903, in forward outputs = block( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 391, in forward attn_outputs = self.attn( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 332, in forward attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 202, in _attn mask_value = torch.full([], mask_value, dtype=attn_weights.dtype).to(attn_weights.device) RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.Total time: 1874.34 seconds