lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
40.53k stars 5.66k forks source link

Please fix AMD GPU mem allocation issue. #1294

Open xjbar opened 10 months ago

xjbar commented 10 months ago

There seems to be a memory loop issue causing the application to crash when trying to render images. This is a major issue and would like to know if it is going to be addressed or not. Just curious where it is on the kanban board :D

OycheD commented 10 months ago

+1

pythonmaster9000 commented 10 months ago

+1

AlexeyJersey commented 10 months ago

+1

stainz2004 commented 10 months ago

+1

ferencsimon415 commented 10 months ago

+1

TheRexo commented 10 months ago

+1

heltonteixeira commented 10 months ago

+1

grendahl06 commented 10 months ago

politely +1

I appreciate being able to use your software; and I would be happy to provide any logs or exceptions needed to help the Devs on this project.

I have 32GB of memory, an 8GB Radeon 6650, and an AMD 7950. I have tried with switches such as --lowvram which yields an exception stating I did not compile for CUDA cores; and I have tried some of the other suggested fixes which all appear to result in the system first allocating 100% of available GPU memory and then not using it while crashing when it needed roughly 65MB of GPU memory =(

Let me know if I can provide any other details.

cytrixme commented 9 months ago

+1 highly appreciating your work.

psadac commented 9 months ago

A week ago I have installed Fooocus on a Manjaro linux on a laptop (AMD Ryzen 6900HS 32GB RAM, AMD 6800S 8GB VRAM). Everything run almost without HIP Error using any SDXL model I have tested, with any option and with up to 4 LORAs in advanced mode. So far, I have only been able to get a memory error with an "Upscale (2x)".

I have installed Fooocus cloning the Github repo :

git clone https://github.com/lllyasviel/Fooocus.git

Created python environment :

python -m venv venv
source venv/bin/activate

Upgraded pip to the latest version (probably not necessary) :

pip install --upgrade pip

Installed PyTorch nightly with ROCm 5.7 (see "Install Pytorch" paragraph on https://pytorch.org/)

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7

installed the requirements :

pip install -r requirements_versions.txt

created a file "webui.sh" with the content below :

#!/bin/sh
source venv/bin/activate
HSA_OVERRIDE_GFX_VERSION=10.3.0 python entry_with_update.py --preset realistic 

made it executable and run it :

chmod +x webui.sh
./webui.sh

The HSA_OVERRIDE_GFX_VERSION seems to be the most important configuration option. If I remember correctly 10.3.0 should work with RDNA2 cards while 11.0.0 with RDNA3 cards.

However yesterday, after upgrading Fooocus :

git pull 

I had this error :

ERROR: Cannot install -r requirements_versions.txt (line 1), -r requirements_versions.txt (line 12), -r requirements_versions.txt (line 14), -r 
requirements_versions.txt (line 16), -r requirements_versions.txt (line 18), -r requirements_versions.txt (line 3), -r requirements_versions.txt (line 5),
-r requirements_versions.txt (line 8) and numpy==1.23.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested numpy==1.23.5
    torchsde 0.2.5 depends on numpy>=1.19.*; python_version >= "3.7"
    transformers 4.30.2 depends on numpy>=1.17
    accelerate 0.21.0 depends on numpy>=1.17
    scipy 1.9.3 depends on numpy<1.26.0 and >=1.18.5
    pytorch-lightning 1.9.4 depends on numpy>=1.17.2
    gradio 3.41.2 depends on numpy~=1.0
    opencv-contrib-python 4.8.0.74 depends on numpy>=1.21.2; python_version >= "3.10"
    opencv-contrib-python 4.8.0.74 depends on numpy>=1.23.5; python_version >= "3.11"
    opencv-contrib-python 4.8.0.74 depends on numpy>=1.17.0; python_version >= "3.7"
    opencv-contrib-python 4.8.0.74 depends on numpy>=1.17.3; python_version >= "3.8"
    opencv-contrib-python 4.8.0.74 depends on numpy>=1.19.3; python_version >= "3.9"
    onnxruntime 1.16.3 depends on numpy>=1.24.2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

replacing numpy==1.23.5 by numpy==1.24.2 in requirements_versions.txt and installing it fixes the problem and everything runs fine again, but I am not sure this is the way to do it.

lllyasviel commented 9 months ago

"onnxruntime 1.16.3 depends on numpy>=1.24.2" means you are using python 3.11 3.10 will not have this problem

psadac commented 9 months ago

Yes, you're right, I am using Python 3.11. However when I installed Foocus a week ago I didn't have any error, I may just have been lucky.

magicAUS commented 9 months ago

I've finally got it to render with ver. 2.1.860 using my 6700 XT (12GB VRAM), however the VRAM is still being detected as 1024MB only, and thus very very slow renders. Task Manager and AMD Overlay shows full 12GB GPU utilisation though...

My run.bat looks like this, which includes the --attention-split code as suggested in the script when run.bat is running. Any ideas?

.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --preset realistic --attention-split
pause

[

Screenshot 2024-01-06 132158 Screenshot 2024-01-06 132223 Screenshot 2024-01-06 132658

](url)

oXb3 commented 8 months ago

I've finally got it to render with ver. 2.1.860 using my 6700 XT (12GB VRAM), however the VRAM is still being detected as 1024MB only, and thus very very slow renders. Task Manager and AMD Overlay shows full 12GB GPU utilisation though...

My run.bat looks like this, which includes the --attention-split code as suggested in the script when run.bat is running. Any ideas?

.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --preset realistic --attention-split
pause

Screenshot 2024-01-06 132158 Screenshot 2024-01-06 132223 Screenshot 2024-01-06 132658

I am experiencing the exact same issue. My GPU is an AMD 7800XT. I have done everything stated in the quoted post. Not sure if I am missing something entirely or just simply doing something wrong. Any insight would be greatly appreciated. Granted, this does not stop the program from running, it is just slower than expected.

mashb1t commented 7 months ago

@xjbar currently doing issue cleanup. Is this issue still present for you using the latest version of Fooocus or can it be closed?

magicAUS commented 7 months ago

@xjbar currently doing issue cleanup. Is this issue still present for you using the latest version of Fooocus or can it be closed?

@mashb1t - mine and @oXb3 's issue still present in latest version (2.1.865) fyi

hqnicolas commented 7 months ago

Runing here without no problem https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd to RX6000 use: HSA_OVERRIDE_GFX_VERSION=10.3.0

magicAUS commented 6 months ago

Runing here without no problem https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd to RX6000 use: HSA_OVERRIDE_GFX_VERSION=10.3.0

@hqnicolas does everything on that URL go into run.bat?

hqnicolas commented 6 months ago

@magicAUS you need to: clean install ubuntu 22.04 copy and paste every step manually to the terminal first you need to read the blue title that says 1 - Driver install 2 - Before Run 3 - Run it

magicAUS commented 6 months ago

@magicAUS you need to: clean install ubuntu 22.04 copy and paste every step manually to the terminal first you need to read the blue title that says 1 - Driver install 2 - Before Run 3 - Run it

@hqnicolas I think the OS is the differentiator to it working. @oXb3 and I are on Windows (11 Pro for me).

hqnicolas commented 6 months ago

@magicAUS insert an extra SSD on your machine and build it

mathshenry commented 3 months ago

I am facing the exact same issue on Windows. I am running on an RX7800xt, with 32gb of RAM. Fooocus only recognizes 1024MB of vRam and when starting to generate the models it throws the following:

Fooocus\modules\anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at C:__w\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.) s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True) 3%|██▊ | 1/30 [00:07<03:23, 7.01s/it][W dml_heap_allocator.cc:120] DML allocator out of memory!

Any solution to it? I've been searching for one but no success so far.

Thank you!

grendahl06 commented 3 months ago

Fooocus really was cool. Please understand I'm not insulting their work.

I have an RX 6650, and I found that SD.Next with the zluda pipeline works best for me.

Most of the memory issues seem to be a Microsoft and AMD bug, but somehow, the zluda stuff makes it work pretty well

I was getting 15 minutes with cpu per image, and now I get 1-2 minutes per image now that I'm running on GPU

I hope that helps a little.

On Thu, Jun 20, 2024, 8:07 PM Matheus Henrique de Oliveira < @.***> wrote:

I am facing the exact same issue on Windows. I am running on an RX7800xt, with 32gb of RAM. Fooocus only recognizes 1024MB of vRam and when starting to generate the models it throws the following:

Fooocus\modules\anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at C:__w\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.) s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True) 3%|██▊ | 1/30 [00:07<03:23, 7.01s/it][W dml_heap_allocator.cc:120] DML allocator out of memory!

Any solution to it? I've been searching for one but no success so far.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/Fooocus/issues/1294#issuecomment-2181753097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGXUTLLDTCTMJJ5LYPIZVVDZINVEHAVCNFSM6AAAAABANPQIUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRG42TGMBZG4 . You are receiving this because you commented.Message ID: @.***>