Closed JarekDerp closed 7 months ago
Thank you for the feedback and insights. Relates to #970, and my intention was that you post this in https://github.com/lllyasviel/Fooocus/issues/1690, so people with similar problems can connect ^^ I can't see an indication of "loading the models multiple times", seems ok so far, but still AMD-ok with high resource usage compared to Nvidia GPUs. The VRAM usage in the 2nd gemeration is also fine as the model isn't unloaded between images, and depending on parameters the model then stays in VRAM. Maybe DirectML causes the model to stay loaded, but normally it gets unloaded after usage when being done generating all images for a batch. This seems maybe a bit off, afaik there is general, non-vendor-specific handling in the code for unloading / freeing memory. I sadly do not have an AMD GPU available to test so you might do an additional test and check if the model is correctly unloaded when switching the model (at least briefly) or if they just stack in VRAM/RAM, which would indicate a memory leak. Thank you again for the analysis, much appreciated.
I think it's working fine as well.
I think it's working fine. A bit of memory problems when switching models but it's not machine/app-breaking, at least on my end. Still, I think people with only 16GB of RAM or less would struggle with Windows+AMD+DirectML configuration. But it's up to you to decide if it's worth mentioning or not.
Thanks for the great work, and many successes in the future!
I just got a random error/information in the log when running one image.
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 15.42 seconds
0%| | 0/30 [00:00<?, ?it/s]C:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)
7%|▋ | 2/30 [00:26<05:51, 12.54s/it]
But the generation speed seems quite normal, just a heads up.
I have the same with the AMD 7900 xtx. Even when I am done with generating an image and Fooocus is not doing anything, my VRAM of 24GB keep maxed out.
That's how Directml works. It doesn't offload the model from VRAM to RAM when idle.
The main issue here is that using directml is a low effort workaround from Microsoft that can be used to make SD work on AMD cards. Either Microsoft would have to put some work into directml; or AMD make generation work on their cards without any workarounds. IMO, if you have an AMD card you have two choices:
I have a 12gb vram AMD card and I don't hope to run anything else then a pruned 1.5 SD model.
@mashb1t as requested yesterday, I'm pasting the content of the console and RAM and VRAM usage. I though it was normal but you informed me that something might be wrong.
When the generation starts, the CPU usage goes up, RAM gets filled, and then slowly VRAM gets filled in as well. When GPU starts working on the image, the CPU usage goes down. So looks like some tasks are done by CPU.
After generation is done, VRAM is still full, but the VRAM goes down to about 16-20 GB out of 32GB.
Here's the console log, I run it with "--debug" parameter in case it makes a difference:
Oh yes, I forgot to write it. Relates to: https://github.com/lllyasviel/Fooocus/issues/1690 Continuation form: https://github.com/lllyasviel/Fooocus/issues/970