This installs xformers and uses memory efficient cross attention, which can decrease Stable Diffusion inference time by 25-100% depending of the model of GPU you have.
It was implemented in my stable-diffusion fork a while ago, it just took me time to get around to building it for the Dockerfiles and CUDA toolkit used by this repo.
To best take advantage of the speedups, the default batch_size in config.yml has been increased to 4. Thanks for memory efficient cross attention, this results in a smaller increase in memory usage than before while increasing inference speed about 33% on my 3090.
This installs xformers and uses memory efficient cross attention, which can decrease Stable Diffusion inference time by 25-100% depending of the model of GPU you have.
It was implemented in my
stable-diffusion
fork a while ago, it just took me time to get around to building it for the Dockerfiles and CUDA toolkit used by this repo.To best take advantage of the speedups, the default
batch_size
inconfig.yml
has been increased to 4. Thanks for memory efficient cross attention, this results in a smaller increase in memory usage than before while increasing inference speed about 33% on my 3090.