huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.88k stars 962 forks source link

Stable Diffusion 3.5 Large CUDA OUT_OF_MEMORY on RTX 3090 #2597

Open danielclough opened 2 weeks ago

danielclough commented 2 weeks ago

When I run cargo run --example stable-diffusion-3 --release --features=cuda -- --which 3.5-large --prompt "pretty picture" I am get Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory") with Stable Diffusion 3.5 Large and Turbo.

According to this chart from stability.ai they should run on an RTX 3090.

chart

LaurentMazare commented 2 weeks ago

That seems odd, we made a couple optimizations to memory usage following #2574 and in the end, SD 3.5 large was reported to work well on a GPU with only 20GB of memory. Maybe there are some other processes using the memory? If not it would be good to run a nsys profile to see when the memory is being used.

danielclough commented 2 weeks ago

There are no other processes running.

How would you like me to run nsys?

Here's some system info:


cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"

rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)

cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)

NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6
...
super-fun-surf commented 2 weeks ago

have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.

danielclough commented 1 week ago

have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.

I have not.

@LaurentMazare Should cudnn be required to run it properly?

LaurentMazare commented 1 week ago

cudnn shouldn't be necessary but might indeed help reduce gpu memory usage. That said, I can running the command you mentioned only results in using ~20GB of memory in my case so my guess is that something else is off there. 20241112-mem

danielclough commented 1 week ago

The GPU doesn't actually fill up all the memory.

image

Any suggestions for how to troubleshoot this would be welcome.

LaurentMazare commented 1 week ago

Not sure how much I would trust the memory usage reported by some external tool (especially here where it seems to only measure memory usage every 10s), it's probably safer to use nsys to get a proper memory profile.

super-fun-surf commented 1 week ago

are you unable to run it with cudnn. It really did help and my ADA4000 with 20GB won't run SD35L without it. Also I would also recommend the nsys for monitoring.

danielclough commented 1 week ago

Unless it is supposed to require cudnn I am not interested in the workaround.

This isn't something that is important to me, so I don't know if I will make time to troubleshoot it without hand holding.

Feel free to close the issue if cudnn is supposed to be required.

Otherwise, I guess someone else will care enough to troubleshoot.

super-fun-surf commented 1 week ago

that chart you showed is for 3Medium you are trying to load 3.5 Large. how much memory is on your video card?