jina-ai / dalle-flow

🌊 A Human-in-the-Loop workflow for creating HD images from text
grpcs://dalle-flow.dev.jina.ai
2.83k stars 209 forks source link

Add memory efficient cross attention from xformers #133

Closed AmericanPresidentJimmyCarter closed 1 year ago

AmericanPresidentJimmyCarter commented 1 year ago

This installs xformers and uses memory efficient cross attention, which can decrease Stable Diffusion inference time by 25-100% depending of the model of GPU you have.

It was implemented in my stable-diffusion fork a while ago, it just took me time to get around to building it for the Dockerfiles and CUDA toolkit used by this repo.

To best take advantage of the speedups, the default batch_size in config.yml has been increased to 4. Thanks for memory efficient cross attention, this results in a smaller increase in memory usage than before while increasing inference speed about 33% on my 3090.