lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
41.3k stars 5.84k forks source link

RTX 2060 tried everything won't run since 11/28 #1621

Closed AFOLcast closed 10 months ago

AFOLcast commented 10 months ago

Read Troubleshoot

[x] I admit that I have read the Troubleshoot before making this issue.

Describe the problem Started a clean re-install. Followed all troubleshooting. Swap memory is at 44000-60000. Tried with and without old xformers. Most recent run with new xformers. Hangs.

Full Console Log

D:\Fooocus>.\python_embeded\python.exe -s Fooocus\entry_with_update.py Already up-to-date Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py'] Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.1.855 Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch(). Total VRAM 6144 MB, total RAM 16200 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'} Base model loaded: D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors]. Loaded LoRA [D:\Fooocus\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.72 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 8128164886135262337 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] cute puppy, fine intricate, elegant, highly detailed, symmetry, sharp focus, majestic, amazing bright colors, radiant light, vivid color, coherent, dazzling, brilliant, colorful, very scientific background, professional, winning, open artistic, deep aesthetic, magical, scenic, thought complex, extremely cool, creative, cinematic, singular, best, real, imagined, dramatic [Fooocus] Preparing Fooocus text #2 ... Traceback (most recent call last): File "D:\Fooocus\Fooocus\modules\async_worker.py", line 806, in worker handler(task) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Fooocus\Fooocus\modules\async_worker.py", line 408, in handler expansion = pipeline.final_expansion(t['task_prompt'], t['task_seed']) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\Fooocus\extras\expansion.py", line 117, in call features = self.model.generate(tokenized_kwargs, File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\generation\utils.py", line 1572, in generate return self.sample( File "D:\Fooocus\python_embeded\lib\site-packages\transformers\generation\utils.py", line 2619, in sample outputs = self( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 1080, in forward transformer_outputs = self.transformer( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 903, in forward outputs = block( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 391, in forward attn_outputs = self.attn( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 332, in forward attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 202, in _attn mask_value = torch.full([], mask_value, dtype=attn_weights.dtype).to(attn_weights.device) RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Total time: 1874.34 seconds

mashb1t commented 10 months ago

Same error as in https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2144, where one of the solutions was to do exactly what your error has output:

For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Can you please add this to your startup command (either directly or in run.bat) and check again?

CUDA_LAUNCH_BLOCKING=1 .\python_embeded\python.exe -s Fooocus\entry_with_update.py

AFOLcast commented 10 months ago

Thank you! I will try right away. It's unfortunate that I'm just technical enuff to screw things up... not understand installations too well .

May I ask what this command does?

J

On Thu, Dec 28, 2023, 5:16 AM Manuel Schmid @.***> wrote:

Same error as in AUTOMATIC1111/stable-diffusion-webui#2144 https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2144, where one of the solutions was to do exactly what your error has output:

For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Can you please add this to your startup command (either directly or in run.bat) and check again?

CUDA_LAUNCH_BLOCKING=1 .\python_embeded\python.exe -s Fooocus\entry_with_update.py

— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/Fooocus/issues/1621#issuecomment-1871025300, or unsubscribe https://github.com/notifications/unsubscribe-auth/BB4LFNPL6ZHNH5M4UUNHHPDYLVBHPAVCNFSM6AAAAABBE4N2RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZRGAZDKMZQGA . You are receiving this because you authored the thread.Message ID: @.***>

mashb1t commented 10 months ago

Sure, happy to explain it to you. As of your console log you start Fooocus by executing this line (either manually or via run.bat) in D:\Fooocus: .\python_embeded\python.exe -s Fooocus\entry_with_update.py My proposal is to just prefix it with CUDA_LAUNCH_BLOCKING=1 as suggested by the transformers package (origin of the error you've provided) for further debugging and analysis. This may even solve your issue completely, but let's test.

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

To do so, you can either directly execute mentioned command in D:\Fooocus or adjust the existing line in your run.bat file.

Hope this explanation helped to understand what this does.

AFOLcast commented 10 months ago

Did as you suggested. Maybe too literally. Got this error message.

D:\Fooocus>CUDA_LAUNCH_BLOCKING=1 .\python_embeded\python.exe -s Fooocus\entry_with_update.py 'CUDA_LAUNCH_BLOCKING' is not recognized as an internal or external command, operable program or batch file.

D:\Fooocus>pause Press any key to continue . . .

AFOLcast commented 10 months ago

Trying it as two statements:

D:\Fooocus>set CUDA_LAUNCH_BLOCKING=1

D:\Fooocus> .\python_embeded\python.exe -s Fooocus\entry_with_update.py Already up-to-date

mashb1t commented 10 months ago

Yeah, the option i mentioned is for Linux, sry. Content of my run.bat file:

set CUDA_LAUNCH_BLOCKING=1
.\python_embeded\python.exe -s Fooocus\entry_with_update.py <args here>
pause
AFOLcast commented 10 months ago

It's running now. Won't know for a little while whether it will bomb out or not. Even with Afterburner, slow. But I do great work with Fooocus, so I'm REALLY trying to make this happen,

AFOLcast commented 10 months ago

Failed. Here's the console:

Microsoft Windows [Version 10.0.22631.2861] (c) Microsoft Corporation. All rights reserved.

D:\Fooocus>CUDA_LAUNCH_BLOCKING=1 'CUDA_LAUNCH_BLOCKING' is not recognized as an internal or external command, operable program or batch file.

D:\Fooocus>set CUDA_LAUNCH_BLOCKING=1

D:\Fooocus> .\python_embeded\python.exe -s Fooocus\entry_with_update.py Already up-to-date Update succeeded. [System ARGV] ['Fooocus\entry_with_update.py'] Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] Fooocus version: 2.1.855 Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch(). Total VRAM 6144 MB, total RAM 16200 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : native VAE dtype: torch.float32 Using pytorch cross attention Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'} Base model loaded: D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors]. Loaded LoRA [D:\Fooocus\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [D:\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 0.92 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 8000631531285694637 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] cute puppy, fine detail, intricate, elegant, dynamic, vibrant color, highly detailed, symmetry, sharp focus, beautiful, divine, professional, ambient light, cute, magical, vivid, artistic, true magic, pure, full background, dramatic, shining, epic, great composition, cinematic, winning, perfect, rational, scenic, lively, novel, atmosphere, best [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] cute puppy, intricate, elegant, highly detailed, wonderful colors, sweet, sharp focus, symmetry, fine detail, colorful, professional, extremely luxury, stunning, enhanced quality, very inspirational, color, winning, epic, cinematic, amazing, creative, beautiful, pure, attractive, cute, best, light, hopeful, thought, iconic, clear, perfect, luxurious [Fooocus] Encoding positive #1 ... [Fooocus Model Management] Moving model(s) has taken 0.28 seconds Traceback (most recent call last): File "D:\Fooocus\Fooocus\modules\async_worker.py", line 806, in worker handler(task) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Fooocus\Fooocus\modules\async_worker.py", line 415, in handler t['c'] = pipeline.clip_encode(texts=t['positive'], pool_top_k=t['positive_top_k']) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Fooocus\Fooocus\modules\default_pipeline.py", line 190, in clip_encode cond, pooled = clip_encode_single(final_clip, text) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Fooocus\Fooocus\modules\default_pipeline.py", line 148, in clip_encode_single result = clip.encode_from_tokens(tokens, return_pooled=True) File "D:\Fooocus\Fooocus\ldm_patched\modules\sd.py", line 131, in encode_from_tokens cond, pooled = self.cond_stage_model.encode_token_weights(tokens) File "D:\Fooocus\Fooocus\ldm_patched\modules\sdxl_clip.py", line 54, in encode_token_weights g_out, g_pooled = self.clip_g.encode_token_weights(token_weight_pairs_g) File "D:\Fooocus\Fooocus\modules\patch_clip.py", line 57, in patched_encode_token_weights out, pooled = self.encode(to_encode) File "D:\Fooocus\Fooocus\ldm_patched\modules\sd1_clip.py", line 191, in encode return self(tokens) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "D:\Fooocus\Fooocus\modules\patch_clip.py", line 143, in patched_SDClipModel_forward outputs = self.transformer(input_ids=tokens, attention_mask=attention_mask, File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 822, in forward return self.text_model( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 740, in forward encoder_outputs = self.encoder( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 654, in forward layer_outputs = encoder_layer( File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 393, in forward hidden_states = self.mlp(hidden_states) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\transformers\models\clip\modeling_clip.py", line 350, in forward hidden_states = self.fc2(hidden_states) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "D:\Fooocus\Fooocus\ldm_patched\modules\ops.py", line 45, in forward return torch.nn.functional.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) Total time: 732.56 seconds

mashb1t commented 10 months ago

This could be a problem with an outdated CUDA version as you don't seem to be using the one-click-installer files (run.bat etc.). Which CUDA (11.8 / 12.1 / X) and pytorch version are you using?

AFOLcast commented 10 months ago

Yes. I am using the run.bat files. I didn't know to use "set" the first time. How do I check Cuda & pytorch version? I simply did a clean install of the most recent version. I'm using the most recent Nvidia driver as well. Hmmm. Photoshop just crapped out saying my gpu is not current. Could that have happened from the set CUDA_LAUNCH_BLOCKING=1 command? Gonna restart. Things are getting wonky,

mashb1t commented 10 months ago

For me, this can be checked in the folder Fooocus\python_embeded\Lib\site-packages. There should be a folder torch and one folder below another one named torch-2.1.0+cu121.dist-info (torch 2.1.0 & CUDA 12.1). If this does not exist, you might have another version installed and the folder might be named differently.

AFOLcast commented 10 months ago

Indeed. This is what I found: torch-2.1.0+cu121.dist-info

mashb1t commented 10 months ago

Sorry, i sadly don't have a direct solution to this, maybe somebody else has additional input.

AFOLcast commented 10 months ago

Do you have any idea id setting low vram might affect this? Or how to accomplish that?

mashb1t commented 10 months ago

You can certainly try to set --always-low-vram and run it again, but i doubt that this will help. Let's give it a shot!

AFOLcast commented 10 months ago

Ok. Now working. I was always bad at the scientific method. I never test one thing at a time.

I reinstalled the nvidia driver. I checked all my cuda settings in Nvidia. Some of them had changed that I wasn't aware of. Perhaps from a recent update. I made sure python.exe was set only for Nvidia GPU. I tried first with the "new" 12 xformers. Bombed out. Tried with the "old" 11 x formers. Worked like a champ.

Now I had done ALL this and more in the last several weeks. Never worked before. But now working with the latests version, .855, and the "old" cuda 11 xformers.

Couldn't be happier.

Marking closed.