turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 235 forks source link

The Google Colab seems to be broken #366

Closed Vuizur closed 3 months ago

Vuizur commented 3 months ago

When executing the chat_example.ipynb in Google colab (https://colab.research.google.com/github/turboderp/exllamav2/blob/master/examples/chat_example.ipynb), I get errors. I execute the (optional) flash attention cell, and then execute the cell where the exllama requirements get installed. Pip warns about the following:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.1.0+cu121 requires torch==2.1.0, but you have torch 2.2.1 which is incompatible.
torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.2.1 which is incompatible.
torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.2.1 which is incompatible.
torchvision 0.16.0+cu121 requires torch==2.1.0, but you have torch 2.2.1 which is incompatible.

(I think this might be caused by incompatible torch versions installed by flash-attn, but no idea.)

When executing the last cell, it fails with:

Traceback (most recent call last):
  File "/content/exllamav2/examples/chat.py", line 5, in <module>
    from exllamav2 import(
  File "/content/exllamav2/exllamav2/__init__.py", line 3, in <module>
    from exllamav2.model import ExLlamaV2
  File "/content/exllamav2/exllamav2/model.py", line 29, in <module>
    from exllamav2.attn import ExLlamaV2Attention
  File "/content/exllamav2/exllamav2/attn.py", line 21, in <module>
    import flash_attn
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymtEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

This might be caused by the previous errors.

(Google colab assigned me a T4, and not the V100 (?) selected by default.)

Thanks a lot for maintaining this project!

turboderp commented 3 months ago

It looks like Colab now has flash-attn preinstalled, but it's the version compiled for Torch 2.1.0, and it doesn't get updated when the requirements.txt installs torch>=2.2.0. I've updated the notebook so it should work again.