Closed iodine-pku closed 1 year ago
I think so, other SD implementations have a code that makes low memory possible at expense of speed. By doing smaller computations on the GPU at the expense of shuffling data around more. I don't see such code in this repo.
Hello! Unfortunately, this repo doesn't have memory management while computing -- you'll get OOM with small VRAM.
This repo is, actually, created because of my frustration with the lengthy, spaghetti-ish, and dependent SD codes (ldm
). So I focused on building a reference implementation -- maximizing clarity of code and making it with pure Python. I had tried to put some memory management logics here, but finally got rid of them because they were too ugly to look (should install random package from somewhere, running CUDA codes (which will efficiently make reading code hard), or apply some hacky attention method). They would be good for production, but not for reference implementation.
If you need decent memory management, you may want to try Diffusers from HuggingFace. They have some memory-related configurations.
You can also try to make changes on this code to reduce memory usage (sliced attention looks promising to me) if you want to learn about them. If you intend to use SD in easy way / production / long-term, probably you'll want well-maintained implementation rather than this one, maintained by a random freshman...
Got it, thanks for your kindly reply!
From: Jinseo Kim @.> Date: 2023-02-15 10:41:54 To: kjsman/stable-diffusion-pytorch @.> Cc: iodine-pku @.>,Author @.> Subject: Re: [kjsman/stable-diffusion-pytorch] Is 4GB VRAM too small for this program? (Issue #9)
Hello! Unfortunately, this repo doesn't have memory management while computing -- you'll get OOM with small VRAM. This repo is, actually, created because of my frustration with the lengthy, spaghetti-ish, and dependent SD codes (ldm). So I focused on building a reference implementation -- maximizing clarity of code and making it with pure Python. I had tried to put some memory management logics here, but finally got rid of them because they were too ugly to look (should install random package from somewhere, running CUDA codes (which will efficiently make reading code hard), or apply some hacky attention method). They would be good for production, but not for reference implementation. If you need decent memory management, you may want to try Diffusers from HuggingFace. They have some memory-related configurations. You can also try to make changes on this code to reduce memory usage (sliced attention looks promising to me) if you want to learn about them. If you intend to use SD in easy way / production / long-term, probably you'll want well-maintained implementation rather than this one, maintained by a random freshman... — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Thanks for the implementation! I got
OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 4.00 GiB total capacity; 3.31 GiB already allocated; 0 bytes free; 3.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
for runningdemo.ipynb
. Is there some solution to it?