Open FuhrerStein opened 2 weeks ago
- DirectML does not return its allocated memory space once it is allocated until the process is terminated.
In other words, the only way to free memory is to restart whole process. Am I right? Maybe you'll suggest me where to move my restart call so that it occurs after script sends last generated image to browser? I suppose it would be possible to insert it somwhere at the end of anyio queue, but I was not able to figure out how that works.
- forge has a similar feature to setting memory limitation. It may help you.
Correct me if I'm wrong, but it seems to not support directml as torch backend, am I right? I have Low-end Ryzen, so zluda/Rocm under Windows seems to be out of the options for me. At least stable-diffusion-webui-amdgpu I was able to run only with --use-directml.
In other words, the only way to free memory is to restart whole process. Am I right?
Yes.
Maybe you'll suggest me where to move my restart call so that it occurs after script sends last generated image to browser? I suppose it would be possible to insert it somwhere at the end of anyio queue, but I was not able to figure out how that works.
Strictly speaking, restarting process is not possible. You may manually restart webui via terminal. Or, there should be another (master) process that saves states and restarts (kill and spawn) python.
Correct me if I'm wrong, but it seems to not support directml as torch backend, am I right? I have Low-end Ryzen, so zluda/Rocm under Windows seems to be out of the options for me. At least stable-diffusion-webui-amdgpu I was able to run only with --use-directml.
Forge supports DirectML. It is --directml
while --use-directml
in stable-diffusion-webui-amdgpu.
There is stable-diffusion-webui-amdgpu-forge too, which is a merge of webui-amdgpu and webui-forge.
You may manually restart webui via terminal. Or, there should be another (master) process that saves states and restarts (kill and spawn) python.
That process is webui.bat
file that starts python script. That's why my solution works - it allows me to generate hundreds of images on unattended pc, although without this modification I get bsod on second or third image without restart. The modification I propose in my first message does restart with freeing memory and it prevents errors caused by memory overuse. But it has some issues, hence my question.
There is stable-diffusion-webui-amdgpu-forge
Didn't know it exist, thanks. After a lot of trial and error, I was able to run it on my system with every possible tweak to use less memory. What's strange, stable-diffusion-webui-amdgpu-forge does use more than 16G of shared memory without crash, although I still caught crashes a few times. However, main issue with your fork of Forge was that it uses way too much all memory for my system. I peaked at 40 GB of system memory usage (half of which was shared) without even generating 1 Mpx image. At the same time, my modified version of your stable-diffusion-webui-amdgpu allows me to generate 1,15 megapixel image without crash or memory overflow.
Checklist
What happened?
Generation on AMD Ryzen 3 5300G with directml leads to BSOD every time Shared video memory usage exceeds 16GB limit.
Steps to reproduce the problem
Generate a batch of images with directml on Ryzen iGPU
What should have happened?
System should not restart with Blue Screen Of Death message.
What browsers do you use to access the UI ?
Other
Sysinfo
sysinfo-2024-11-02-08-32.json
Console logs
Additional information
I use webui on AMD Ryzen 3 5300G, it has Radeon Vega iGPU. Windows 11, latest video driver.
In my case I've been able to run only with --use-directml option, no ONNX. I use --lowvram and other optimizations, but even then, depending on image size, memory consumption can be over 12-14 GB (for image sizes around 1 megapixel).
Also, there seems to be a memory leak. Shared video memory (aka dynamic) does not get freed after every generation. So, the only way to avoid huge memory consumption is to restart the whole script. On top of all that, I get BSOD every time memory consumption exceeds 16GB limit.
So, my only option is to restart batch file every time there is a chance that the next generation will overflow memory usage. I suppose it's a problem for every Windows Ryzen user, not just me.
This can be considered both as bug report and as a Feature Request. I'll describe possible solutions from ideal one to the simplest in implementation. I don't have those solutions, only the simplest one.
For now, my solution is a bad version of variant 3. It restarts script if memory usage exceeds the threshold given by the user. However, while doing so, it fails to send last generated data to the gui. So, image gets saved, and memory gets free, but gui shows confusing data. Also, this method fails to protect memory when user generates in batches, as it runs only after all generations are finished. Still, with "Generate forever" gui function, my method allows to run many hours of image generation without interruptions. And with how slow Ryzed GPU is, this is very much needed.
My solution adds 6 lines into call_queue.py: 2 lines in the beginning and 4 in wrap_gradio_call_no_job. Here are they:
What I ask is for a better solution, that would: