lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
41.23k stars 5.82k forks source link

Segfault on linux with AMD GPU #1783

Closed carnager closed 8 months ago

carnager commented 10 months ago

Read Troubleshoot

[x] I admit that I have read the Troubleshoot before making this issue.

Describe the problem I installed fooocus on linux using the instructions on the main page. I uninstalled regular torch and installed the amd version as mentioned on front page. I created a 40GB swap space and then ran the app with python launch.py --attention-split When i try to issue a image generation it seems to do something but then segfaults.

some info about my setup:

GPU:
2d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] (rev c5)

Memory:
               total        used        free      shared  buff/cache   available
Mem:            31Gi       6,4Gi        12Gi       307Mi        13Gi        24Gi
Swap:           39Gi          0B        39Gi

CPU:
model name  : AMD Ryzen 9 5900X 12-Core Processor

Full Console Log

(fooocus_env) carnager@caprica ~/Apps/Fooocus > python launch.py --attention-split
[System ARGV] ['launch.py', '--attention-split']
Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 12272 MB, total RAM 32018 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6700 XT : native
VAE dtype: torch.float32
Using split optimization for cross attention
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/carnager/Apps/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/carnager/Apps/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/carnager/Apps/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/carnager/Apps/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.52 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8714579776560103216
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
zsh: segmentation fault (core dumped)  python launch.py --attention-split
mashb1t commented 10 months ago

Can you please check if it works without setting --attention-split (not setting any arguments)? Thanks!

carnager commented 10 months ago

yeah, tried that already, same behavior without any arguments

codeliger commented 10 months ago

I have the same issue before and after creating the 40GB swap partition. It doesn't seem to be ram/memory related.

image

Full logs:

$ python launch.py 
[System ARGV] ['launch.py']
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
amdgpu.ids: No such file or directory
amdgpu.ids: No such file or directory
Total VRAM 8176 MB, total RAM 32018 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon Graphics : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/codeliger/dl/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/codeliger/dl/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/codeliger/dl/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/codeliger/dl/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.52 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 5323403996105043708
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)
darkraisisi commented 10 months ago

I have a similar problem, it seems like the swap is not being used or found. I am using an Nvidia 3090 but when forcing cpu only an error pops up about not finding virtual memory, i think this problem is related.

Running normally:

[System ARGV] ['launch.py']
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 24257 MB, total RAM 15912 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : native
VAE dtype: torch.bfloat16
Using pytorch cross attention
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: /home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/myName/Documents/img-gen/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.34 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 5428285024980375409
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] brown horse on the beach, intricate, elegant, highly detailed, wonderful colors, sweet, extremely delicate, majestic, holy, dramatic, sharp focus, professional composition, fantastic, iconic, fine light, excellent, very inspirational, ambient, artistic, vibrant, imposing, epic, thought, magnificent, stunning, awesome, cinematic, dynamic, complex, amazing, creative, brilliant
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] brown horse on the beach, intricate, elegant, highly detailed, extremely shiny, wonderful colors, ambient light, dynamic background, sharp focus, professional fine detail, best animated, cinematic, singular, rich, vivid, beautiful, unique, cute, attractive, epic, gorgeous, stunning, great, awesome, amazing, breathtaking, dramatic, illuminated, outstanding, very coherent, perfect
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 2.52 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.93 seconds
  0%|                                                                                                                                                                  | 0/30 [00:00<?, ?it/s]
Segmentation fault (core dumped)

Running cpu only:

(fooocus) myName@pop-os:~/Documents/img-gen/Fooocus$ python entry_with_update.py --preview-option fast --always-cpu
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--preview-option', 'fast', '--always-cpu']
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 15912 MB, total RAM 15912 MB
Set vram state to: DISABLED
Always offload VRAM
Device: cpu
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/myName/Documents/img-gen/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8454048247502736915
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] brown horse on the beach, cinematic, epic, dramatic ambient, professional, highly detailed, extremely beautiful, emotional, cute, symmetry, intricate, light, surreal, pretty, inspiring, elegant, crisp sharp focus, artistic, very inspirational,, novel, romantic, new, cheerful, inspired, generous, color, cool, passionate, vibrant, background, colorful, shiny
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] brown horse on the beach, intricate, elegant, highly detailed, extremely beautiful, glowing, sharp focus, refined, complex, colors, cinematic, surreal, artistic, scenic, attractive, thought, singular, iconic, fine detail, clear, ambient light, full color, perfect composition, symmetry, aesthetic, great, pure, pristine, very inspirational, professional, winning, best
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 12.63 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 119.24 seconds
  0%|                                                                                                                                                                  | 0/30 [00:00<?, ?it/s]
/home/myName/anaconda3/envs/fooocus/lib/python3.10/site-packages/psutil/__init__.py:1973: RuntimeWarning: available memory stats couldn't be determined and was set to 0
  ret = _psplatform.virtual_memory()

  7%|██████████                                                                                                                                             | 2/30 [07:04<1:37:07, 208.12s/it]^CKeyboard interruption in main thread... closing server.

Nvidia-smi, driver & cuda versions. (which should be compatible with the current torch version.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:08:00.0  On |                  N/A |
|  0%   18C    P8              18W / 350W |    752MiB / 24576MiB |     10%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2320      G   /usr/lib/xorg/Xorg                          245MiB |
|    0   N/A  N/A      2430      G   /usr/bin/gnome-shell                        110MiB |
|    0   N/A  N/A      3106      G   ...sion,SpareRendererForSitePerProcess       52MiB |
|    0   N/A  N/A      3334      G   firefox                                     325MiB |
+---------------------------------------------------------------------------------------+

EDIT: After downgrading drivers to 535.129.03 just to be sure the results remain the same.

I looked in the docs and in other issues for how to go about debugging this but that is not clear to me, i'd love to help contribute if there is some resources i can start with.

Laurent-VueJS commented 10 months ago

I have same issue. (Ryzen R9 7900X RAM 64 GB of which 16GB of VRAM for integrated GPU). When I run on linux I have the same issue exactly. On windows (I dual boot), it runs more or less OK (I have sometimes a crash because of the memory leak issue but it works). Could it be linked to the version of rocm that is different in the instructions of linux/AMD (5.6 instead of 5.7) ? I have read that it does not go well with VAE version (?) I tried a manual upgrade of ROCM but it caused other problems. I found interresting on the same subject: https://www.reddit.com/r/comfyui/comments/15b8lxd/comfyui_is_not_detecting_my_gpus_vram/

cgerardin commented 9 months ago

Hello, Same issue here.

Working with --always-cpu

Feel free to tell if I can do some test or provide more information.

Athoir commented 9 months ago

Hello, I had the same issue with the segfault on the following hardware:

I managed to make it run doing the following:

I can't test if it works on the RX 6000 series as my previous card is fried.

Hope this helps :smile:

carnager commented 9 months ago

sadly this does not work for me...

Memory access fault by GPU node-1 (Agent handle: 0x7f6c79b37c80) on address 0x7f6d60e85000. Reason: Page not present or supervisor privilege.

ok, could make it run with lowvram option, but it never finishes generation of any images

cgerardin commented 9 months ago

Thank you @Athoir, but exactly same as @carnager, it run with --attention-split && --always-low-vram options, but fail shorty after the begining of the image generation :

Memory access fault by GPU node-1 (Agent handle: 0x7facdd668c60) on address 0x7fae2da8b000. Reason: Page not present or supervisor privilege. Abandon (core dumped)

Complete steps to reproduce on Fedora / Nobara :

$ sudo dnf install python3.10 rocm-opencl rocm-hip-runtime
$ python3.10 -m venv fooocus_env
$ source fooocus_env/bin/activate
$ pip install -r requirements_versions.txt
$ pip uninstall torch torchvision torchaudio torchtext functorch xformers 
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
$ HSA_OVERRIDE_GFX_VERSION=11.0.0 python entry_with_update.py --attention-split --always-low-vram

Perhaps related to torch version ? (see https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/8139#issuecomment-1545521725)

OronDF343 commented 9 months ago

For RX 6700 XT, setting HSA_OVERRIDE_GFX_VERSION=10.3.0 helped, as mentioned here

carnager commented 9 months ago

For RX 6700 XT, setting HSA_OVERRIDE_GFX_VERSION=10.3.0 helped, as mentioned here

not for me....

[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] cosy bear reading a book, warm colors, cinematic, highly detailed, incredible quality, very inspirational, thought, fancy, epic, singular background, elegant, intricate, dynamic light, beautiful, enhanced, bright, colorful, color, illuminated, inspired, deep rich vivid, coherent, glowing, complex, amazing, symmetry, full composed, brilliant, perfect composition, pure
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] cosy bear reading a book, light flowing magic, cool colors, glowing, amazing, highly detailed, intricate, sharp focus, professional animated, vivid, best, contemporary, modern, romantic, inspired, new, creative, beautiful, attractive, advanced, cinematic, artistic color, surreal, emotional, cute, adorable, perfect, focused, positive, exciting, lucid, joyful
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 5.71 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828

and then nothing happens

cgerardin commented 9 months ago

Working FAST with HSA_OVERRIDE_GFX_VERSION=10.3.0 (with and without --attention-split) ! Many thanks @OronDF343

Senshi00 commented 9 months ago

Also RX 6700 XT user, using HSA_OVERRIDE_GFX_VERSION=10.3.0 helped

TheNexter commented 9 months ago

Also RX 6700 XT user, using HSA_OVERRIDE_GFX_VERSION=10.3.0 helped

I confirm, 6600 XT, this solve the problem

hqnicolas commented 8 months ago

I'm Runing here without no problems using this GIST: you need to flag HSA_OVERRIDE_GFX_VERSION=10.3.0 to Radeon 6000 https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd