Open danielclough opened 2 weeks ago
That seems odd, we made a couple optimizations to memory usage following #2574 and in the end, SD 3.5 large was reported to work well on a GPU with only 20GB of memory. Maybe there are some other processes using the memory? If not it would be good to run a nsys profile to see when the memory is being used.
There are no other processes running.
How would you like me to run nsys?
Here's some system info:
cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)
cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)
NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6
...
have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.
have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.
I have not.
@LaurentMazare Should cudnn be required to run it properly?
cudnn shouldn't be necessary but might indeed help reduce gpu memory usage. That said, I can running the command you mentioned only results in using ~20GB of memory in my case so my guess is that something else is off there.
The GPU doesn't actually fill up all the memory.
Any suggestions for how to troubleshoot this would be welcome.
Not sure how much I would trust the memory usage reported by some external tool (especially here where it seems to only measure memory usage every 10s), it's probably safer to use nsys to get a proper memory profile.
are you unable to run it with cudnn. It really did help and my ADA4000 with 20GB won't run SD35L without it. Also I would also recommend the nsys for monitoring.
Unless it is supposed to require cudnn I am not interested in the workaround.
This isn't something that is important to me, so I don't know if I will make time to troubleshoot it without hand holding.
Feel free to close the issue if cudnn is supposed to be required.
Otherwise, I guess someone else will care enough to troubleshoot.
that chart you showed is for 3Medium you are trying to load 3.5 Large. how much memory is on your video card?
When I run
cargo run --example stable-diffusion-3 --release --features=cuda -- --which 3.5-large --prompt "pretty picture"
I am getError: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")
with Stable Diffusion 3.5 Large and Turbo.According to this chart from stability.ai they should run on an RTX 3090.