[Bug]: HUGE VRAM usage vs. Vanilla SD, Self-Breaking software

srt-jay commented 6 months ago

Checklist

[x] The issue exists after disabling all extensions
[ ] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[ ] The issue exists in the current version of the webui
[ ] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

I have an RX 7800 XT, which has 16GB of VRAM. Anything above 640x360 spits out an error claiming to be out of VRAM. This happens without the upscaler, which is even worse. I swear this worked last week. I didn't intentionally update anything, either. Trying across different samplers, I got mixed results but for the most part, 16GB wasn't "enough." I'm not sure if this is because the model I'm using is fairly large, but normal A1111 had literally zero problems doing this on an 8GB card that is wayyyy slower, using the same model (on my secondary PC)

After one thing or another, I started getting an error upon startup

" venv "C:\Users\User\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. "

Upon closing and reopening a few times, it opened again, although still displayed the same error. Aaaand now, it doesn't start at all.

So, I'm not sure what happened, but it appears something in the software has the ability to break itself, and it very much did. I mean, when I originally downloaded this, I remember having to edit the "webui-user.bat" to include "--use-directml" or something like that because it literally did not have that in the command line args. I don't know if I just got unlucky, but I think I'm just gonna install the original A1111 on my main PC and use the work-arounds for that... I don't have time to chase issues like this any more and I'm finding it to be a worsening problem in AI lately.

Steps to reproduce the problem

What I did was attempt to generate a (1) standard definition image, and after enough times of running completely out of VRAM (16GB) eventually the venv wouldn't even start. Completely broke itself.

Also worth noting, it and basically everything else I'm using have been running on Python 3.10.6 without issue. Some time in between my last time using it, this one changed pyvenv.cfg to use Python 3.11... I don't even have 311 installed, what? Changed it back. Cool, it worked! And then, magically, after one successful launch, broke itself again.

What should have happened?

Normal behavior would be launching like it normally does. I rebooted... Checked my other things that run on Python, and they're doing fine. This one still halts at the launcher.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

That didn't work

Console logs

venv "C:\Users\Jay\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: 1.7.0
Commit hash: cfa6e40e6d7e290b52940253bf705f282477b890

Additional information

No response

Gamemaster2022 commented 6 months ago

I would personally delete the venv folder and let it regenerate and reinstall torch. Something might have broken in there. Check if you have torch-directml in requirements_versions.txt instead of just "torch". Then just simply run webui-user.bat and the process will start. Good luck

MrFuzzyPants11 commented 5 months ago

For any dev reading this. I notice that when starting the webui GPU usage instantly goes to 15.8/16gb (on a 6900xt) with no process. I believe this is causing the issue as when the task calls for lets say 1gb of memory there is only 0.2gb to spare so it crashes.

Also clearing venv did work but only for generating like 4 images. The moment I tried anything bigger than 512 x 512 (560 x 768 in this case) the program hit its "not enough video memory" runtime error and now wont generate anything until I delete venv again.

Danrejk commented 5 months ago

I can confirm this is also an issue for me on my 6650xt exactly as @MrFuzzyPants11 described it. It's very annoying.

lshqqytiger commented 4 months ago

This is torch-directml specific issue. Try ROCm on Linux or ZLUDA for improvements in memory management.

lshqqytiger / stable-diffusion-webui-amdgpu