Can't run docker because of flash attention 2?

TwitchWorkshop commented 2 months ago

System Info

I am trying to run this gpt4o app and when trying to run docker; I get the same response every time. I have installed everything that I could and even specifically installed 4.40 to be sure that it would work properly; since it says that it does not support 4.41 or higher... I have included the system info details and the terminal output below. Any assistance or ideas on this would be extremely helpful. This is the repo: FardinHash/GPT-4o

transformers-cli env

transformers version: 4.40.0
Platform: Linux-5.4.0-190-generic-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found
PyTorch version (GPU?): 2.4.1+cu121 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

root@WorkshopTest:/var/sftp/uploads/panelTest/fffffff# docker compose up [+] Running 1/0 ✔ Container fffffff-gpt4o-1 Created 0.0s Attaching to gpt4o-1 gpt4o-1 | Requirement already satisfied: flash-attn in /usr/local/lib/python3.12/site-packages (2.6.3) gpt4o-1 | Requirement already satisfied: torch in /usr/local/lib/python3.12/site-packages (from flash-attn) (2.4.1) gpt4o-1 | Requirement already satisfied: einops in /usr/local/lib/python3.12/site-packages (from flash-attn) (0.8.0) gpt4o-1 | Requirement already satisfied: filelock in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.15.4) gpt4o-1 | Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (4.12.2) gpt4o-1 | Requirement already satisfied: sympy in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (1.13.2) gpt4o-1 | Requirement already satisfied: networkx in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.3) gpt4o-1 | Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.1.4) gpt4o-1 | Requirement already satisfied: fsspec in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (2024.6.1) gpt4o-1 | Requirement already satisfied: setuptools in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (74.1.2) gpt4o-1 | Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (9.1.0.70) gpt4o-1 | Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.3.1) gpt4o-1 | Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (11.0.2.54) gpt4o-1 | Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (10.3.2.106) gpt4o-1 | Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (11.4.5.107) gpt4o-1 | Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.0.106) gpt4o-1 | Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (2.20.5) gpt4o-1 | Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: triton==3.0.0 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.0.0) gpt4o-1 | Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.12/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->flash-attn) (12.6.68) gpt4o-1 | Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/site-packages (from jinja2->torch->flash-attn) (2.1.5) gpt4o-1 | Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/site-packages (from sympy->torch->flash-attn) (1.3.0) gpt4o-1 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. gpt4o-1 | Traceback (most recent call last): gpt4o-1 | File "/usr/src/app/app.py", line 4, in gpt4o-1 | from bot import chatbot, model_inference, BOT_AVATAR, EXAMPLES, model_selector, decoding_strategy, temperature, max_new_tokens, repetition_penalty, top_p gpt4o-1 | File "/usr/src/app/bot.py", line 31, in gpt4o-1 | "idefics2-8b-chatty": Idefics2ForConditionalGeneration.from_pretrained( gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3826, in from_pretrained gpt4o-1 | config = cls._autoset_attn_implementation( gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 1129, in _autoset_attn_implementation gpt4o-1 | config = super()._autoset_attn_implementation( gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1556, in _autoset_attn_implementation gpt4o-1 | cls._check_and_enable_flash_attn_2( gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1667, in _check_and_enable_flash_attn_2 gpt4o-1 | raise ImportError(f"{preface} Flash Attention 2 is not available. {install_message}") gpt4o-1 | ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2. gpt4o-1 exited with code 1

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Using the library GPT-4o from FardinHash here on GitHub, just running the 'docker compose up' command results in this output each time...

Expected behavior

Get the error output claiming that Flash Attention 2 must be installed; even though it already is.

LysandreJik commented 2 months ago

I don't have experience from this specific tool; it seems like transformers isn't detecting flash attention 2 or at least not detecting the correct version of it.

Have you tried opening an issue on the original repository?

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers