huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.17k stars 26.33k forks source link

Can't run docker because of flash attention 2? #33335

Open TwitchWorkshop opened 1 week ago

TwitchWorkshop commented 1 week ago

System Info

I am trying to run this gpt4o app and when trying to run docker; I get the same response every time. I have installed everything that I could and even specifically installed 4.40 to be sure that it would work properly; since it says that it does not support 4.41 or higher... I have included the system info details and the terminal output below. Any assistance or ideas on this would be extremely helpful. This is the repo: FardinHash/GPT-4o

transformers-cli env

root@WorkshopTest:/var/sftp/uploads/panelTest/fffffff# docker compose up [+] Running 1/0 ✔ Container fffffff-gpt4o-1 Created 0.0s Attaching to gpt4o-1 gpt4o-1 | Requirement already satisfied: flash-attn in /usr/local/lib/python3.12/site-packages (2.6.3) gpt4o-1 | Requirement already satisfied: torch in /usr/local/lib/python3.12/site-packages (from flash-attn) (2.4.1) gpt4o-1 | Requirement already satisfied: einops in /usr/local/lib/python3.12/site-packages (from flash-attn) (0.8.0) gpt4o-1 | Requirement already satisfied: filelock in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.15.4) gpt4o-1 | Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (4.12.2) gpt4o-1 | Requirement already satisfied: sympy in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (1.13.2) gpt4o-1 | Requirement already satisfied: networkx in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.3) gpt4o-1 | Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.1.4) gpt4o-1 | Requirement already satisfied: fsspec in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (2024.6.1) gpt4o-1 | Requirement already satisfied: setuptools in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (74.1.2) gpt4o-1 | Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (9.1.0.70) gpt4o-1 | Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.3.1) gpt4o-1 | Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (11.0.2.54) gpt4o-1 | Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (10.3.2.106) gpt4o-1 | Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (11.4.5.107) gpt4o-1 | Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.0.106) gpt4o-1 | Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (2.20.5) gpt4o-1 | Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: triton==3.0.0 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.0.0) gpt4o-1 | Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.12/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->flash-attn) (12.6.68) gpt4o-1 | Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/site-packages (from jinja2->torch->flash-attn) (2.1.5) gpt4o-1 | Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/site-packages (from sympy->torch->flash-attn) (1.3.0) gpt4o-1 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. gpt4o-1 | Traceback (most recent call last): gpt4o-1 | File "/usr/src/app/app.py", line 4, in gpt4o-1 | from bot import chatbot, model_inference, BOT_AVATAR, EXAMPLES, model_selector, decoding_strategy, temperature, max_new_tokens, repetition_penalty, top_p gpt4o-1 | File "/usr/src/app/bot.py", line 31, in gpt4o-1 | "idefics2-8b-chatty": Idefics2ForConditionalGeneration.from_pretrained( gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3826, in from_pretrained gpt4o-1 | config = cls._autoset_attn_implementation( gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 1129, in _autoset_attn_implementation gpt4o-1 | config = super()._autoset_attn_implementation( gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1556, in _autoset_attn_implementation gpt4o-1 | cls._check_and_enable_flash_attn_2( gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1667, in _check_and_enable_flash_attn_2 gpt4o-1 | raise ImportError(f"{preface} Flash Attention 2 is not available. {install_message}") gpt4o-1 | ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2. gpt4o-1 exited with code 1

Who can help?

No response

Information

Tasks

Reproduction

Using the library GPT-4o from FardinHash here on GitHub, just running the 'docker compose up' command results in this output each time...

Expected behavior

Get the error output claiming that Flash Attention 2 must be installed; even though it already is.

LysandreJik commented 1 week ago

I don't have experience from this specific tool; it seems like transformers isn't detecting flash attention 2 or at least not detecting the correct version of it.

Have you tried opening an issue on the original repository?