'_flash_supports_window_size' is not defined

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

https://arxiv.org/abs/2409.06666

Apache License 2.0

2.62k stars 177 forks source link

'_flash_supports_window_size' is not defined #32

Open boji123 opened 2 months ago

boji123 commented 2 months ago

transformers 4.43.4

maugomez77 commented 1 month ago

any update on this one?

NghiaaPD commented 1 month ago

same issue :)))

Domanmaker commented 1 month ago

me too

UltraEval commented 1 month ago

flash_attn is too lower, must be >=2.1.0 if use CUDA better download whl from https://github.com/Dao-AILab/flash-attention/releases, then pip install