turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 272 forks source link

Issue with flash attention when upgrading to torch 2.3.1 #522

Closed remichu-ai closed 3 months ago

remichu-ai commented 3 months ago

I alwas encounter this error after upgrading flash attention. However, noted from the wheel you have bumped torch version to 2.3.1

Do you encounter this issue with flash attention on torhc 2.3.1?

  File "/home/remichu/miniconda3/envs/mlenv/lib/python3.11/site-packages/exllamav2/__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "/home/remichu/miniconda3/envs/mlenv/lib/python3.11/site-packages/exllamav2/model.py", line 41, in <module>
    from exllamav2.attn import ExLlamaV2Attention, has_flash_attn, has_xformers
  File "/home/remichu/miniconda3/envs/mlenv/lib/python3.11/site-packages/exllamav2/attn.py", line 30, in <module>
    import flash_attn
  File "/home/remichu/miniconda3/envs/mlenv/lib/python3.11/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/remichu/miniconda3/envs/mlenv/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /home/remichu/miniconda3/envs/mlenv/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
remichu-ai commented 3 months ago

I resolve it by running the following command, though i have no clue how it works:

pip uninstall flash-attn
FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn

i found this instruction from this thread: https://github.com/oobabooga/text-generation-webui/issues/4182