Redo custom attention processor to support other attention types

StAlKeR7779 commented 5 days ago

Summary

Current attention processor implements only torch-sdp attention type, so when any ip-adapter or regional prompt used, we override model to run torch-sdp attention. New attention processor combines 4 attention processors(normal, sliced, xformers, torch-sdp) by moving parts of attention that differs(mask preparation and attention itself), to separate function call, where required implementation executed.

Related Issues / Discussions

None

QA Instructions

Change attention_type in invokeai.yaml and then run generation with ip-adapter or regional prompt.

Merge Plan

None?

Checklist

[x] The PR has a short but descriptive title, suitable for a changelog
[ ] Tests added / updated (if applicable)
[ ] Documentation added / updated (if applicable)

@dunkeroni @RyanJDick

RyanJDick commented 5 days ago

I haven't looked at the code yet, but do you know if there are still use cases for using attention processors other than Torch 2.0 SDP? Based on the benchmarking that diffusers has done, it seems like the all around best choice. But maybe there are still reasons to use other implementation e.g. very-low-vram system?

StAlKeR7779 commented 5 days ago

I thought roughly same: normal - generally no need in it xformers - if you said that torch-sdp on par or even faster, then too can be removed sliced - yes it's suitable for low memory situations, and I think it's main attention for mps

psychedelicious commented 5 days ago

On CUDA, torch's SDP was faster than xformers for me when I last checked a month or so back. IIRC it was just a couple % faster.

invoke-ai / InvokeAI