madebyollin / maple-diffusion

Stable Diffusion inference on iOS / macOS using MPSGraph
https://madebyoll.in/posts/maple_diffusion/
MIT License
793 stars 51 forks source link

Memory Requirements? #1

Closed Lukas1h closed 1 year ago

Lukas1h commented 1 year ago

What are the minimum memory requirements for iOS? Mac OS? Would this run on an iPhone SE (2nd Gen) with only 3gb of memory?

Off-topic, but love the Hobbit references in the source.

madebyollin commented 1 year ago

For iOS it probably requires a fairly recent iPhone or iPad (6GB iPhone 13 Pro is all I've tested; 4GB devices might work but I wouldn't bet on it). Maple Diffusion needs ~3.4GB peak memory even in "slower-but-use-less-memory" mode, and iOS force-quits apps aggressively when they get close to memory limits.

image

For macOS, I expect Maple Diffusion would run fine on any Apple Silicon Mac (even the 8GB configs) - it only uses around 5.6GB of memory even in "faster-but-use-more-memory" mode. I haven't tried Maple Diffusion on an Intel Mac, so YMMV, but it might work okay if you have a good discrete GPU.

image

love the Hobbit references in the source.

Thanks!

Lukas1h commented 1 year ago

Thanks for the info! I know you've already made quite a few memory optimizations, but is there anything that could be further optimized to get under 3gb? Would reducing the image size help?

Also, this PR reduces the VRAM requirements to about 2.86 for 512x512 images by halfing the attention. Could that be applied here?

I'm also going to try on an Intel Mac, with no GPU today, so 🤞

madebyollin commented 1 year ago

is there anything that could be further optimized to get under 3gb

Probably (there's always something to optimize) - but unfortunately I've exhausted all of the stuff I could think of. Truly optimizing SD under 3GB may in fact be easy (and just involve some trick / MPSGraph quirk I'm not aware of), or it may require throwing out MPSGraph's abstraction and writing everything in pure Metal - either way, I'm not aware of any straightforward path to reduce memory further (which is annoying - even on the 6GB devices it could run faster if some of the slower MEM-HACKs could be removed)

Would reducing the image size help?

I just tried it; oddly it seems to take about as much peak memory as the larger images (and just complete faster).

Also, this PR reduces the VRAM requirements to about 2.86 for 512x512 images by halfing the attention

Hmm, AFAICT the key change in that PR is splitting up parts of the cross-attention subgraph across heads, which this code already does (using the extreme version of one iteration per head). Most of the other stuff in that PR is inplace / early-garbage-collection stuff that isn't really applicable to MPSGraph unfortunately.

Lukas1h commented 1 year ago

Ok, thanks for the help. Unfortunately I don't have much experience with MPSGraph so I'm a bit over my head. So I'll close this issue for now.

I guess I'll start working on a CoreML implementation for now.

Also, ~60 seconds a step on my Intel Mac.

God Bless!