Open StAlKeR7779 opened 5 days ago
I haven't looked at the code yet, but do you know if there are still use cases for using attention processors other than Torch 2.0 SDP? Based on the benchmarking that diffusers has done, it seems like the all around best choice. But maybe there are still reasons to use other implementation e.g. very-low-vram system?
I thought roughly same:
normal
- generally no need in it
xformers
- if you said that torch-sdp
on par or even faster, then too can be removed
sliced
- yes it's suitable for low memory situations, and I think it's main attention for mps
On CUDA, torch's SDP was faster than xformers for me when I last checked a month or so back. IIRC it was just a couple % faster.
Summary
Current attention processor implements only
torch-sdp
attention type, so when any ip-adapter or regional prompt used, we override model to runtorch-sdp
attention. New attention processor combines 4 attention processors(normal
,sliced
,xformers
,torch-sdp
) by moving parts of attention that differs(mask preparation and attention itself), to separate function call, where required implementation executed.Related Issues / Discussions
None
QA Instructions
Change
attention_type
ininvokeai.yaml
and then run generation with ip-adapter or regional prompt.Merge Plan
None?
Checklist
@dunkeroni @RyanJDick