Open handshape opened 4 months ago
You're right, I think this might be an issue in stable-diffusion.cpp though. I can recreate the same behaviour in stable-diffusion.cpp by adding a sleep at the end of the CLI code main.cpp file inside examples\cli\main.cpp. Even though it seems everything should have been unloaded, not all the VRAM is released until the code finally exits. This is assuming the CLI code includes everything to properly unload the VRAM but it's also possible I have missed something somewhere.
I will create an issue in the stable-diffusion.cpp repo and see if anyone has any suggestions.
Hi there --
I'm not sure if the issue I'm seeing is in these bindings or in the upstream lib, but I'm observing that when using the high-level API on CUBLAS, that after the
__del__
wiring does the fancy stepping to invokefree_sd_ctx
when the model gets either garbage collected or explicitly deleted -- not all the VRAM gets released.After some experimentation, I've noticed that the amount that hangs around is always almost exactly the same amount as gets allocated for the VAE, plus about 100MB. VAE tiling reduced the size of the leak, and doing the VAE phase on the CPU leaves just the 100 or so MB of leftovers.
If I was going to hazard a guess, there's more being allocated in stable-diffusion.cpp's
load_from_file()
than is getting freed by the free_sd_ctx() call.If there's anything I can do to help sleuth this out, please don't hesitate to ask.