Add memory efficient cross attention from xformers

This installs xformers and uses memory efficient cross attention, which can decrease Stable Diffusion inference time by 25-100% depending of the model of GPU you have.

It was implemented in my stable-diffusion fork a while ago, it just took me time to get around to building it for the Dockerfiles and CUDA toolkit used by this repo.

To best take advantage of the speedups, the default batch_size in config.yml has been increased to 4. Thanks for memory efficient cross attention, this results in a smaller increase in memory usage than before while increasing inference speed about 33% on my 3090.

jina-ai / dalle-flow

Add memory efficient cross attention from xformers #133