Flux Inference memory optimisation by pre-computing modulation parameters ahead of time

Vargol commented 1 month ago

DiffusionKit has an excellent memory optimisation for Flux where it calculates modulation parameters ahead of time then offloads the adaLN_modulation parameters which for fp16 saves ~ 6.5 GB peak memory usage during inference.

https://github.com/argmaxinc/DiffusionKit/pull/15/

It would be nice if a similar change could be made to Diffusers.

It's especially useful for MPS users who are getting left behind with the current trend using quantisation to reduce memory usage rather than optimisation and for whom CPU offloading is not a thing.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu commented 5 days ago

thanks @Vargol I think this particular optimization will require a lot of change in our design so we'd like to recommend using DiffusionKit for MPS for now!

huggingface / diffusers

Flux Inference memory optimisation by pre-computing modulation parameters ahead of time #9197