AMD Segmentation Fault - Githubissues

lllyasviel / Fooocus

Focus on prompting and generating

GNU General Public License v3.0

40.12k stars 5.56k forks source link

AMD Segmentation Fault #1288

Open CobeyH opened 9 months ago

CobeyH commented 9 months ago

Describe the problem I am running Ubuntu with an AMD GPU. I configured my environment variables and set up rocminfo as suggested by this issue: https://github.com/lllyasviel/Fooocus/issues/1079 .

The web page now launches successfully and it no longer shows an error that the GPU isn't detected. However, when I enter a text or image prompt and click the "Generate" button, a segmentation fault occurs.

** System Info *** System: Ubuntu 22.04.3 CPU: AMD Ryzen 5 3600 GPU: AMD RX 6750XT Python: 3.10.13 Environment: Venv

HCC_AMDGPU_TARGET=gfx1031 HSA_OVERRIDE_GFX_VERSION=10.3.2

Full Console Log Update failed. authentication required but no callback set Update succeeded. [System ARGV] ['entry_with_update.py'] Python 3.10.13 (main, Aug 25 2023, 13:20:03) [GCC 9.4.0] Fooocus version: 2.1.824 Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch(). Total VRAM 12272 MB, total RAM 15903 MB Set vram state to: NORMAL_VRAM Disabling smart memory management Device: cuda:0 AMD Radeon RX 6750 XT : native VAE dtype: torch.float32 Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention Refiner unloaded. model_type EPS adm 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE extra keys {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'} Base model loaded: /home/cobey/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/cobey/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors]. Loaded LoRA [/home/cobey/repos/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/cobey/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 1.79 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 4950368496917309143 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [1] 10757 segmentation fault (core dumped) python entry_with_update.py

NL-TCH commented 9 months ago

got exactly the same on RX5700XT

python entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.11.6 (main, Oct  3 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)]
Fooocus version: 2.1.824
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 8176 MB, total RAM 31833 MB
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon RX 5700 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/user/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/user/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/user/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/user/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 1.57 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7295514245041223923
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)

Khoraji commented 9 months ago

Same fault, core dumped 5700XT

L226 commented 9 months ago

Same here, followed https://github.com/lllyasviel/Fooocus/issues/1079 successfully.

However I'm using Radeon graphics with my R7 pro 5850U. Tried with and without --use-split-cross-attention

Ubuntu 22.04.3, 6.1.66 AMD Ryzen 7 Pro 5850U AMD Radeon Graphics 48 GB RAM

python entry_with_update.py --preset realistic --use-split-cross-attention
Update failed.
authentication required but no callback set
Update succeeded.
[System ARGV] ['entry_with_update.py', '--preset', 'realistic', '--use-split-cross-attention']
Loaded preset: /home/user/genai/Fooocus/presets/realistic.json
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Fooocus version: 2.1.824
Running on local URL:  http://127.0.0.1:7866

To create a public link, set `share=True` in `launch()`.
Total VRAM 4096 MB, total RAM 43960 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon Graphics : native
VAE dtype: torch.float32
Using split optimization for cross attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors
Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors].
Loaded LoRA [/home/user/genai/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors] with 788 keys at weight 0.25.
Loaded LoRA [/home/user/genai/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors] with 264 keys at weight 0.25.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 2.21 seconds
App started successful. Use the app with http://127.0.0.1:7866/ or 127.0.0.1:7866
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 3.0
[Parameters] Seed = 6293613909801716834
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] ship on fire, dramatic, intricate, elegant, highly detailed, extremely new, professional, cinematic, artistic, sharp focus, color light, winning, romantic, smart, cute, epic, creative, cool, loving, attractive, pretty, charming, complex, amazing, passionate, charismatic, colorful, coherent, iconic, fine, vibrant, incredible, beautiful, awesome, pure
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] ship on fire, full color, cinematic, stunning, highly detailed, formal, serious, determined, elegant, professional, artistic, emotional, pretty, attractive, smart, charming, best, dramatic, sharp focus, beautiful, cute, modern, futuristic, surreal, iconic, fine detail, colorful, ambient light, dynamic, amazing, symmetry, intricate, elite, magical
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 8.59 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Segmentation fault (core dumped)

Khoraji commented 9 months ago

Pretty much exactly how mine shakes out after the big string of adjectives then core dump. I also have the same exact thing on my integrated R5 laptop and my 5700xt desktop :/

On Sun, 10 Dec 2023, 14:19 L226, @.***> wrote:

Same here, followed #1079 https://github.com/lllyasviel/Fooocus/issues/1079 successfully.

However I'm using Radeon graphics with my R7 pro 5850U. Tried with and without --use-split-cross-attention

Ubuntu 22.04.3, 6.1.66 AMD Ryzen 7 Pro 5850U AMD Radeon Graphics 48 GB RAM

python entry_with_update.py --preset realistic --use-split-cross-attention Update failed. authentication required but no callback set Update succeeded. [System ARGV] ['entry_with_update.py', '--preset', 'realistic', '--use-split-cross-attention'] Loaded preset: /home/user/genai/Fooocus/presets/realistic.json Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] Fooocus version: 2.1.824 Running on local URL: http://127.0.0.1:7866

To create a public link, set share=True in launch(). Total VRAM 4096 MB, total RAM 43960 MB Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram Set vram state to: LOW_VRAM Disabling smart memory management Device: cuda:0 AMD Radeon Graphics : native VAE dtype: torch.float32 Using split optimization for cross attention Refiner unloaded. model_type EPS adm 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'} Base model loaded: /home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors]. Loaded LoRA [/home/user/genai/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors] with 788 keys at weight 0.25. Loaded LoRA [/home/user/genai/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors] with 264 keys at weight 0.25. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cpu, use_fp16 = False. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 2.21 seconds App started successful. Use the app with http://127.0.0.1:7866/ or 127.0.0.1:7866 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 3.0 [Parameters] Seed = 6293613909801716834 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] ship on fire, dramatic, intricate, elegant, highly detailed, extremely new, professional, cinematic, artistic, sharp focus, color light, winning, romantic, smart, cute, epic, creative, cool, loving, attractive, pretty, charming, complex, amazing, passionate, charismatic, colorful, coherent, iconic, fine, vibrant, incredible, beautiful, awesome, pure [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] ship on fire, full color, cinematic, stunning, highly detailed, formal, serious, determined, elegant, professional, artistic, emotional, pretty, attractive, smart, charming, best, dramatic, sharp focus, beautiful, cute, modern, futuristic, surreal, iconic, fine detail, colorful, ambient light, dynamic, amazing, symmetry, intricate, elite, magical [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (1152, 896) Preparation time: 8.59 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Segmentation fault (core dumped)

— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/Fooocus/issues/1288#issuecomment-1848978492, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4TKOCAYQQZYW7FFJRJ6QE3YIXAHFAVCNFSM6AAAAABAM5PT3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBYHE3TQNBZGI . You are receiving this because you commented.Message ID: @.***>

galvani4987 commented 9 months ago

I'm running mint linux fully update and all. Running on a AMD Ryzen 5600G + RX 5600 XT 6 Gb + 32gb DDR4. I get the exact same segmentation fault (core dumped). I have tested a bunch of args and VARIABLES but no luck. I have installed rocm 5.7 but for every test i get a different error message and end up with a fail. So i got back to the start and to this thread. I hope someone figures it out. Thanks a lot everyone, this is great and we are pretty close to making it work... i hope.

galvani4987 commented 9 months ago

This has been published by lllyasviel: https://github.com/lllyasviel/Fooocus/issues/1327 I did enlarge my swapfile to 64G using this tutorial: https://linuxhandbook.com/increase-swap-ubuntu/ Reinstalled Fooocus from scratch and ran it. About a minute or so after i hit Generate it gets stuck in "[Fooocus] Preparing Fooocus text #1 ..." Then it segfaults.

Robin-qwerty commented 9 months ago

I have the same issue. Running arch and I have a RX 6750 XT, 32GB ram and 40GB swap

(fooocus_env) [root@ArchLinuxRobin Fooocus]# python entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.10.10 (main, Mar  5 2023, 22:26:53) [GCC 12.2.1 20230201]
Fooocus version: 2.1.835
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 12272 MB, total RAM 31955 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6750 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /root/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/root/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/root/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/root/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.24 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7930202201705363266
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)
(fooocus_env) [root@ArchLinuxRobin Fooocus]#

All I get to see in the browser is 'Waiting for task to start ...'

And my memory is barely used

wnm210 commented 9 months ago

This has been published by lllyasviel: #1327 I did enlarge my swapfile to 64G using this tutorial: https://linuxhandbook.com/increase-swap-ubuntu/ Reinstalled Fooocus from scratch and ran it. About a minute or so after i hit Generate it gets stuck in "[Fooocus] Preparing Fooocus text #1 ..." Then it segfaults.

same here, and it's stuck

L226 commented 8 months ago

Tried increasing swapfile (in my case - disabling existing 1G swap partition and creating /activating new 40G swap file) with cache pressure = 100, swappiness = 60), still segfaults:

...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 10.88 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Segmentation fault (core dumped)

Looking at swap usage it didn't really use anything, also RAM util looked pretty low.

Running strace on the process showed some funny lookups, so I guess the AMD integration still needs work or I need to re-install some packages;

e.g.

[pid ****] access("/usr/local/games/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
[pid ****] access("/snap/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
[pid ****] access("/snap/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)

I will try to look more deeply into it after the break

eVen-gits commented 8 months ago

Getting segfault as well. I don't think it's a RAM issue (128GB).


Kernel: 6.6.7-4-MANJARO 
Uptime: 1 day, 22 hours, 59 mins 
Packages: 1184 (pacman), 11 (flatpak) 
Shell: bash 5.2.21 
Resolution: 3840x1600 
DE: Plasma 5.27.10 
WM: kwin 
Theme: [Plasma], Breeze [GTK2/3] 
Icons: [Plasma], breeze [GTK2/3] 
Terminal: konsole 
CPU: AMD Ryzen 5 5600X (12) @ 3.700GHz 
GPU: AMD ATI Radeon RX 5600 OEM/5600 XT / 5700/5700 XT 
Memory: 19460MiB / 128710MiB ```

WYOhellboy commented 8 months ago

Also getting segmentation fault: CPU: AMD Ryzen 7 2700x RAM: 48GB GPU: AMD Radeon RX 7800xt Swap: 55GB Using Manjaro with Gnome as DE.

klassiker commented 8 months ago

Got segfaults as well, but managed to fix it. Here is what I found:

With whl/rocm5.6, I've got a plain segfault with no information. Excerpt from strace right before the segfault:

strace -ff python entry_with_update.py --preset realistic
.....
[pid  XXXX] ioctl(6, AMDKFD_IOC_MAP_MEMORY_TO_GPU, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_QUEUE, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x40} ---

After trying whl/nightly/rocm5.7, I've got a little bit of error information:

[pid  XXXX] ioctl(6, AMDKFD_IOC_MAP_MEMORY_TO_GPU, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_QUEUE, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] futex(..., FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid  XXXX] write(2, "Exception in thread Thread-2 (wo"..., 39Exception in thread Thread-2 (worker):
...
RuntimeError: HIP error: invalid device function

After finding https://github.com/rocm/rocm/issues/2536 and trying strace -ff python -c 'import torch; torch.rand(3,3).to(torch.device("cuda"))', the same error appeared.

Using export HSA_OVERRIDE_GFX_VERSION=11.0.0 for gfx1100 from rocminfo both the simple test and entry_with_update.py run successfully. The segfault happened to me at the same locations, either on startup using preset realistic or when clicking Generate without a preset, so I guess it's the same issue as here.

For debugging the output of rocminfo | grep Name might help, also try all of whl/rocm5.6, whl/nightly/rocm5.6 and whl/nightly/rocm5.7 with the simple pytorch command in a clean environment using env -i bash, exporting HSA_OVERRIDE_GFX_VERSION to the appropriate value for your GPU. Also verify and check you are using the correct GPU if you have an iGPU. Also check if you can find the error at the same location with strace.

I guess https://github.com/lllyasviel/Fooocus/issues/627 is related.

Hope this helps.

merlinblack commented 8 months ago

After reinstalling the dependencies today, I can run this without needing any env vars to override anything. python -c 'import torch; torch.rand(3,3).to(torch.device("cuda"))'

However I still get a Segfault after clicking 'Generate'

#> python entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.11.7 (main, Dec 18 2023, 00:00:00) [GCC 13.2.1 20231205 (Red Hat 13.2.1-6)]
Fooocus version: 2.1.862
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 12272 MB, total RAM 32035 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6700 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /home/nigel/prog/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/nigel/prog/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/nigel/prog/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/nigel/prog/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.68 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8321946732629474494
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)

It does take a moment to break - but watching my ram usage, both vram and ram usage go up a little on startup, but not any higher after clicking generate.

AstroJMo commented 8 months ago

I have a 7950x and 7900 xtx. I disabled integrated graphics in my bios and I no longer get the segmentation fault. Running the test-rocm.py was showing that I had two rocm devices. I read on another forum that this might cause problems. Seems it was true for me at least.

PiotrCe commented 8 months ago

I'm using: Ubuntu 22.04.3 LTS RX 5700 XT

my rocminfo output:

Agent 2

Name: gfx1010
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 5700 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU

I had the Segmentation fault (core dumped) while using miniconda3. After switching to anaconda this error never appeared again. Now when I run HSA_OVERRIDE_GFX_VERSION=10.3.0 python entry_with_update.py the app starts and after clicking "Generate" I'm getting:

[Fooocus Model Management] Moving model(s) has taken 1.49 seconds App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 3497165507932006909 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... :0:rocdevice.cpp :2692: 2014655231 us: [pid:6187 tid:0x7fac53fff640] Callback: Queue 0x7fa9bdf00000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29 Aborted (core dumped)

Laurent-VueJS commented 8 months ago

I have a 7950x and 7900 xtx. I disabled integrated graphics in my bios and I no longer get the segmentation fault. Running the test-rocm.py was showing that I had two rocm devices. I read on another forum that this might cause problems. Seems it was true for me at least.

Just my 2 cents : For info I use the iGPU of the ryzen r9 7900x and I always have segment fault (or other errors) while I have only this (i)GPU. So multiple GPU might not be the problem but the iGPU might well be (?). I have seen on AMD specs that iGPU's are not officially supported by ROCM :-( NB : on windows (with directml) I can sometimes generate one picture on iGPU but only on "extreme speed" that uses about 40GB of Vram (my limit). Other settings use more than 40GB and the process stops when I reach this limit (probably due to a memory leak (?)

Schweeeeeeeeeeeeeeee commented 7 months ago

Same problem

ttio2tech commented 7 months ago

My 5700XT can run Fooocus without issue. Although it's slow (2 minutes an image for extreme mode, 3 minutes an image for Speed mode). I also made a video at https://youtu.be/HgGZyNRA1Ns

mashb1t commented 6 months ago

@CobeyH is this issue still present for you using the latest version of Fooocus or can it be closed?

Schweeeeeeeeeeeeeeee commented 6 months ago

Still present $ python entry_with_update.py --preset realistic Already up-to-date Update succeeded. [System ARGV] ['entry_with_update.py', '--preset', 'realistic'] Python 3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801] Fooocus version: 2.1.865 Loaded preset: /home/boobs/Fooocus/presets/realistic.json Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch(). Total VRAM 12272 MB, total RAM 31235 MB Set vram state to: NORMAL_VRAM Always offload VRAM Device: cuda:0 AMD Radeon RX 6700 XT : native VAE dtype: torch.float32 Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split Refiner unloaded. model_type EPS UNet ADM Dimension 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} Base model loaded: /home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors]. Loaded LoRA [/home/boobs/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [/home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 788 keys at weight 0.25. Loaded LoRA [/home/boobs/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [/home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 264 keys at weight 0.25. Fooocus V2 Expansion: Vocab with 642 words. Fooocus Expansion engine loaded for cuda:0, use_fp16 = True. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models Segmentation fault (core dumped)

hqnicolas commented 6 months ago

Runing here without no problem https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd to RX6000 use: HSA_OVERRIDE_GFX_VERSION=10.3.0

Schweeeeeeeeeeeeeeee commented 5 months ago

Runing here without no problem https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd to RX6000 use: HSA_OVERRIDE_GFX_VERSION=10.3.0

How would i use HSA_OVERRIDE_GFX_VERSION=10.3.0

Laurent-VueJS commented 5 months ago

HSA_OVERRIDE_GFX_VERSION=xxxx must be placed before the command every time - on a single command (or you can make it permanent in your environment variables - google can tell you how :-) ). Pay attention that the number depends on your card model. Most common are 10.3.0 or 11.0.0 > lookup your card on the internet to be sure (or just try the 2 most common settings and you have 99% chance that one will work). nb : for me I tried the correct value and it still fails. Apparently ROCm does not provide support for some older or integrated AMD GPU's like mine (see the list of supported models on ROCm page). But CPU works very well and my other PC with Nvidia GPU also very well. I love Fooocus :-)

Tedris commented 4 months ago

I am getting the same running in Ubuntu with RX5700 and a 40GB swap, it gets stuck on Preparing Fooocus text 1 before coming back with Segfault

It works fine on Windows but I wanted to see if it would run faster on Linux.

mikwee commented 2 months ago

I'm on Fedora, GPU is Radeon RX 6600, CPU is Intel(R) Core(TM) i5-4690, RAM is 16GB. After I click "Generate", it takes a long time and then segfaults. I increased my swap size to 40GB (with a 32GB file added to a 8GB partition), restarted, and nothing changed. My console output is pretty much identical, but I'll copy-paste it anyway:

Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.10.14 (main, Jun  3 2024, 17:19:22) [GCC 14.1.1 20240522 (Red Hat 14.1.1-4)]
Fooocus version: 2.4.3
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 8176 MB, total RAM 15917 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6600 : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
IMPORTANT: You are using gradio version 3.41.2, however version 4.29.0 is available, please upgrade.
--------
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: /home/testuser/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors
VAE loaded: None
Request to load LoRAs [('sd_xl_offset_example-lora_1.0.safetensors', 0.1)] for model [/home/testuser/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [/home/testuser/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/testuser/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 1.66 seconds
Started worker with PID 4277
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] CLIP Skip = 2
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7406799653888165672
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)

Hope this gets solved soon!