memory-efficient attention is default opened? if i dont use flash attn

lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

MIT License

7.7k stars 666 forks source link

memory-efficient attention is default opened? if i dont use flash attn #48

Open wac81 opened 1 year ago

wac81 commented 1 year ago

or if i want use memory-efficient attention, i must call scaled_dot_product_attention?

PyTorch 2.0 includes an optimized and memory-efficient attention implementation through the torch.nn.functional.scaled_dot_product_attention function

conceptofmind commented 1 year ago

PyTorch 2.0 will automatically set the most appropriate version of attention based on your system specs.

All implementations are enabled by default. Scaled dot product attention attempts to automatically select the most optimal implementation based on the inputs.

scaled_dot_product_attention is used in the repo. If you have Flash set to true but do not have an A100 it should default to mem efficient attn, math, or cpu.

wac81 commented 1 year ago

thanks, but how do i know which one to use? how to check it?

wac81 commented 1 year ago

memory_efficient_attention of xformers it's faster implement than torch? any idea?