Open wac81 opened 1 year ago
PyTorch 2.0 will automatically set the most appropriate version of attention based on your system specs.
All implementations are enabled by default. Scaled dot product attention attempts to automatically select the most optimal implementation based on the inputs.
scaled_dot_product_attention
is used in the repo. If you have Flash set to true but do not have an A100 it should default to mem efficient attn, math, or cpu.
thanks, but how do i know which one to use? how to check it?
or if i want use memory-efficient attention, i must call scaled_dot_product_attention?
PyTorch 2.0 includes an optimized and memory-efficient attention implementation through the torch.nn.functional.scaled_dot_product_attention function