ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
https://arxiv.org/abs/2409.06666
Apache License 2.0
2.62k stars 177 forks source link

'_flash_supports_window_size' is not defined #32

Open boji123 opened 2 months ago

boji123 commented 2 months ago

2024-09-27 14:47:10 | ERROR | stderr | File "anaconda3/envs/llama-omni/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 180, in _flash_attention_forward 2024-09-27 14:47:10 | ERROR | stderr | _flash_supports_window_size and sliding_window is not None and key_states.shape[1] > sliding_window 2024-09-27 14:47:10 | ERROR | stderr | NameError: name '_flash_supports_window_size' is not defined

transformers 4.43.4

maugomez77 commented 1 month ago

any update on this one?

NghiaaPD commented 1 month ago

same issue :)))

Domanmaker commented 1 month ago

me too

UltraEval commented 1 month ago

flash_attn is too lower, must be >=2.1.0 if use CUDA image better download whl from https://github.com/Dao-AILab/flash-attention/releases, then pip install