Open Arcmoon-Hu opened 3 months ago
My gpu is tool old so that can't install flash_attn package. So, I want use vllm.attention.ops.triton_flash_attention replace flash_attn package
No response
The output of `python collect_env.py`
The triton kernel is tailored to AMD at the moment. I would recommend setting VLLM_ATTENTION_BACKEND=xformers instead
Proposal to improve performance
My gpu is tool old so that can't install flash_attn package. So, I want use vllm.attention.ops.triton_flash_attention replace flash_attn package
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)