invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.27k stars 2.32k forks source link

Redo custom attention processor to support other attention types #6550

Open StAlKeR7779 opened 5 days ago

StAlKeR7779 commented 5 days ago

Summary

Current attention processor implements only torch-sdp attention type, so when any ip-adapter or regional prompt used, we override model to run torch-sdp attention. New attention processor combines 4 attention processors(normal, sliced, xformers, torch-sdp) by moving parts of attention that differs(mask preparation and attention itself), to separate function call, where required implementation executed.

Related Issues / Discussions

None

QA Instructions

Change attention_type in invokeai.yaml and then run generation with ip-adapter or regional prompt.

Merge Plan

None?

Checklist

@dunkeroni @RyanJDick

RyanJDick commented 5 days ago

I haven't looked at the code yet, but do you know if there are still use cases for using attention processors other than Torch 2.0 SDP? Based on the benchmarking that diffusers has done, it seems like the all around best choice. But maybe there are still reasons to use other implementation e.g. very-low-vram system?

StAlKeR7779 commented 5 days ago

I thought roughly same: normal - generally no need in it xformers - if you said that torch-sdp on par or even faster, then too can be removed sliced - yes it's suitable for low memory situations, and I think it's main attention for mps

psychedelicious commented 5 days ago

On CUDA, torch's SDP was faster than xformers for me when I last checked a month or so back. IIRC it was just a couple % faster.