lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
41.65k stars 5.94k forks source link

Is Fooocus permanently messing up my AMD Graphics Drivers? #2004

Open Jakobud opened 10 months ago

Jakobud commented 10 months ago

Read Troubleshoot

[x] I admit that I have read the Troubleshoot before making this issue.

Describe the problem I have a Win 10 PC with an AMD 580 graphics card. I followed the instructions to setup Fooocus using AMD cards. When I run focus to generate an image, it appears to be working for a while. The image is being refined, etc. But before it completes my monitors (I have dual monitors) go black and my graphics card fans go 100% speed. I let it sit for a while but it never stops so I have to hard shutdown my PC and reboot.

Once I'm back in Windows I notice only my primary monitor comes on and my 2nd monitor is not detected. I look to open the AMD Driver software and I see this:

image

So Fooocus appears to be making changes to my drivers or something. It does something to the point of the AMD software no longer starting even after restart Windows. I have to completely reinstall the AMD software from scratch to get everything back to normal.

What is happening here?

Full Console Log I can't grab the console because obviously my screen goes black and I can't access anything. Once I restart I have no idea if the console output has been dumped to a log file somewhere or what.

mashb1t commented 10 months ago

Very weird indeed! Fooocus does not modify anything outside of its installation folder with the exception of gradio temp files. There were only reports about lack of power for the GPU causing system freezes, but never issues about driver corruption of any kind (Nvidia/AMD/etc.).

It seems like your GPU driver has a bug that causes the GPU to crash + does not recover from it. I assume you made sure to update to the latest GPU driver available supported by your model, but please double-check and share your driver version.

Is anybody else also having this issue?

patientx commented 10 months ago

that description is an amd gpu driver crash, usually it happens when you don't have enough power going into the gpu or overheating of the card. Install newest drivers, if you can clean your gpu and or other components (maybe too much dust build up etc). Don't use any overclocking with gpu , if the issues still persist gpu might be slowly going dodo.

I have seen this type of errors two times in the past, first time it was my gpu which was dying so the only way to keep using it was underclock both gpu clock and memory ,,, the second time with another gpu , it was my power supply failing changed it and pc is running flawlessly ever since.

Jakobud commented 10 months ago

Yeah I have the latest AMD Adrenaline drivers installed and my GPU and other components are very clean. Little-to-no visible dust. And I do not overclock anything.

This does appear to be reproducible on my end. Is there a way that I can capture log information verbosely so that it is saved even when my PC crashes in this way so that I can relay that log info to you? Perhaps that can help?

eddyizm commented 10 months ago

Yeah I have the latest AMD Adrenaline drivers installed and my GPU and other components are very clean. Little-to-no visible dust. And I do not overclock anything.

This does appear to be reproducible on my end. Is there a way that I can capture log information verbosely so that it is saved even when my PC crashes in this way so that I can relay that log info to you? Perhaps that can help?

I am not sure if this will work on a crash but worth a try. If you are running via the run.bat file, you can open up a terminal window, go to your fooocus folder and then run ./run.bat > console_log.txt
It should save the console output to a text file, (tested locally) just not sure if it will write the out the buffer during a crash but if so, that would be a easy way to do it.

Koech commented 10 months ago

Oh wow. I actually had this exact same issue. I was running a 6700XT dual booting. Fooocus crashed my system, I assumed due to overheat (my junction temps were like 115 c) and literally the exact same error message when I logged into windows. When I tried to reinstall my windows drivers after it, it would pop up with an error, and the only way I fixed things was by flushing my entire Windows OS install. I thought I'd damaged the card or something for a while there since it crashed my system when it happened.

I'm not sure if I can reproduce or anything, since I ended up returning that system, but I can verify a similar scenario.

Jakobud commented 10 months ago

Well I did run.bat > output.log but it wasn't super helpful looking. This is a result of me trying a prompt with a few image prompts. I didn't like what it was doing so far so I stopped it and then started another one without any text prompt and less Styles selected. Eventually sure enough it crashed my system and fucked my graphics drivers.


D:\Downloads\Fooocus_win64_2-1-831>.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y 
Found existing installation: torch 2.0.0
Uninstalling torch-2.0.0:
  Successfully uninstalled torch-2.0.0
Found existing installation: torchvision 0.15.1
Uninstalling torchvision-0.15.1:
  Successfully uninstalled torchvision-0.15.1

D:\Downloads\Fooocus_win64_2-1-831>.\python_embeded\python.exe -m pip install torch-directml 
Requirement already satisfied: torch-directml in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (0.2.0.dev230426)
Collecting torch==2.0.0 (from torch-directml)
  Using cached torch-2.0.0-cp310-cp310-win_amd64.whl (172.3 MB)
Collecting torchvision==0.15.1 (from torch-directml)
  Using cached torchvision-0.15.1-cp310-cp310-win_amd64.whl (1.2 MB)
Requirement already satisfied: filelock in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.12.2)
Requirement already satisfied: typing-extensions in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (4.7.1)
Requirement already satisfied: sympy in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (1.12)
Requirement already satisfied: networkx in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.1)
Requirement already satisfied: jinja2 in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.1.2)
Requirement already satisfied: numpy in c:\users\jake\appdata\roaming\python\python310\site-packages (from torchvision==0.15.1->torch-directml) (1.24.2)
Requirement already satisfied: requests in c:\users\jake\appdata\roaming\python\python310\site-packages (from torchvision==0.15.1->torch-directml) (2.28.2)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from torchvision==0.15.1->torch-directml) (9.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from jinja2->torch==2.0.0->torch-directml) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\jake\appdata\roaming\python\python310\site-packages (from requests->torchvision==0.15.1->torch-directml) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\users\jake\appdata\roaming\python\python310\site-packages (from requests->torchvision==0.15.1->torch-directml) (3.6)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\jake\appdata\roaming\python\python310\site-packages (from requests->torchvision==0.15.1->torch-directml) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\jake\appdata\roaming\python\python310\site-packages (from requests->torchvision==0.15.1->torch-directml) (2023.11.17)
Requirement already satisfied: mpmath>=0.19 in d:\downloads\fooocus_win64_2-1-831\python_embeded\lib\site-packages (from sympy->torch==2.0.0->torch-directml) (1.3.0)
Installing collected packages: torch, torchvision
Successfully installed torch-2.0.0 torchvision-0.15.1

D:\Downloads\Fooocus_win64_2-1-831>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml 
Already up-to-date
Update succeeded.
[System ARGV] ['Fooocus\\entry_with_update.py', '--directml']
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Fooocus version: 2.1.862
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Using directml with device: 
Total VRAM 1024 MB, total RAM 65450 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: D:\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [D:\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [D:\Downloads\Fooocus_win64_2-1-831\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [D:\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 3277258264247382946
[Fooocus] Downloading control models ...
[Fooocus] Loading control models ...
extra clip vision: ['vision_model.embeddings.position_ids']
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] synthwave retro cyber landscape with a grid-like ground and mountains, a setting sun with a vertical colorful rainbow, background lush glowing light, cinematic color, dramatic, sharp focus, highly detailed, intricate, innocent, inspired, grand futuristic, open aesthetic, deep colors, magical, thought, very inspirational, epic, artistic, clear, positive, romantic, beautiful, attractive
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] synthwave retro cyber landscape with a grid-like ground and mountains, a setting sun with a vertical colorful rainbow, background composed, dramatic color, highly detailed, cinematic, complex, glowing, sharp, focus, great composition, adventurous, new, dynamic light, artistic, thought, ambient, iconic, creative, vibrant, beautiful, epic, stunning, gorgeous, best
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Fooocus] Image processing ...
Requested to load CLIPVisionModelWithProjection
Loading 1 new model
Requested to load Resampler
Loading 1 new model
loading in lowvram mode 64.0
Requested to load To_KV
Loading 1 new model
loading in lowvram mode 64.0
Requested to load Resampler
Loading 1 new model
loading in lowvram mode 64.0
Requested to load To_KV
Loading 1 new model
loading in lowvram mode 64.0
Requested to load Resampler
Loading 1 new model
loading in lowvram mode 64.0
Requested to load To_KV
Loading 1 new model
loading in lowvram mode 64.0
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1408, 704)
Preparation time: 92.20 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 8.35 seconds
User stopped
Total time: 299.32 seconds
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 6730664455518928604
[Fooocus] Downloading control models ...
[Fooocus] Loading control models ...
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Fooocus] Image processing ...
Requested to load Resampler
Loading 1 new model
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 0.80 seconds
Requested to load To_KV
Loading 1 new model
loading in lowvram mode 64.0
Requested to load Resampler
Loading 1 new model
loading in lowvram mode 64.0
Requested to load To_KV
Loading 1 new model
loading in lowvram mode 64.0
Requested to load Resampler
Loading 1 new model
loading in lowvram mode 64.0
Requested to load To_KV
Loading 1 new model
loading in lowvram mode 64.0
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1408, 704)
Preparation time: 18.40 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 7.04 seconds
kjit commented 10 months ago

Hi. I have a bit lower level AMD hardware: Radeon RX 480 and 24 GB RAM (Windows 10). At some point I started getting Blue Screen Error constantly, when I've been running Fooocus. Also get some errors from AMD Drivers, that they have to be reset or something like that. It turns out, that major problem was on RAM, I had set in Bios overclocked XAMP profile for 2666 MHz and that causes such issues. Once I lowered speed to default 2133 MHz, Fooocus is working fine. I guess, that directml use extensively both RAM and VRAM so any errors on that parts can broke whole system. You should also monitor GPU usage and temperature on Adrenalin UI , tab Performance , to see if temp doesn't exceed safe values.

mashb1t commented 9 months ago

@Jakobud is this issue still relevant? If not i'd close it in a few days.

Tectract commented 8 months ago

I think this user is seeing the same or related bug as me, in this bug report a couple months old: https://github.com/lllyasviel/Fooocus/issues/1690

@Jakobud you'll need to potentially reinstall your graphics driver from the windows hardware management settings. This is why I make a lot of Windows "checkpoints" or whatever they are called. Your GPU is probably not damaged but the GPU memory got corrupted when it was sent a bad string of assembly instructions by the graphics driver libraries from DirectML.

My recommendation is that we tie all the related bug reports together and @mashb1t can maybe figure out how to interface with the DirectML team directly, to help figure out what's going on and why it's making AMD GPUs go black.

This bug also seems possibly related, from a webui-directml repo... It suggests undervolting or underclocking your GPU slightly to help alleviate the problem, I might try that.

https://github.com/lshqqytiger/stable-diffusion-webui-directml/issues/73

mashb1t commented 8 months ago

Sadly i don't have an AMD GPU, so i think it would be most beneficial if somebody affected can directly test potential suggestions. Therefore i'd propose that we collect open issues caused by DirectML and funnel them in a few refined distinct issues in their repository.

Tectract commented 8 months ago

I also have my GPU overclocked by just a couple percent but I'm pretty sure it was the result of BIOS auto-tuning software so it shouldn't really be causing black-screen crashes. I can look to reset back to only 100% speed on my GPU and see if it helps. Will report back but it may take me a while.

Tectract commented 8 months ago

On second thought, I don't really want to do testing to slow down my main Windows machine and maybe bork the graphics drivers repeatedly to troubleshoot DirectML. Their team should do that on a test machine, lol. Tell them to try using Fooocus on an AMD build (Windows) machine :)

cezzarCz commented 8 months ago

I had also made certain changes to my GPU, using Radeon's own software, but after the first attempt to generate the image, the computer turned off, when turned on again AMD's own software had reset the settings to default due to the crash. I tried running it again, with the default settings, but it didn't change, it got to the part where Fooocus moves the model to the GPU and my computer shut down again. I assume that my computer font is not of very good quality.

Aaioros commented 8 months ago

Hi, my pc did fine for a while, but now i'm getting a black screen when i generate, and i need to restart the pc. My gpu is a Vega 56. Nothing overclocked.

mirh commented 3 months ago

At least someone could post their TDR crash information from the event log, you know? If it isn't temperature/overclock related a computer cannot just crash like that without a trace.

hqnicolas commented 3 weeks ago

Try this method: https://github.com/hqnicolas/lllyasvielFooocusROCm/tree/main

Tectract commented 3 weeks ago

Has it been tested for GPU corruption? I don't feel like being the tester, with my expensive system.

hqnicolas commented 3 weeks ago

Has it been tested for GPU corruption?

please check your graphics card on windows: https://www.techpowerup.com/download/furmark/

if it passes furmark, it will works with that tutorial..... today I was messing with images again, and start to search for people that need help... image

Tectract commented 3 weeks ago

I started runing fooocus on one of my Windows machines with an Nvidia card, and just sharing it to my subnet instead of using my AMD gpu windows machine. I guess I can try it but I want to see it running without critical issues from users for a while, first.