Closed TwitchWorkshop closed 1 month ago
I don't have experience from this specific tool; it seems like transformers isn't detecting flash attention 2 or at least not detecting the correct version of it.
Have you tried opening an issue on the original repository?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
I am trying to run this gpt4o app and when trying to run docker; I get the same response every time. I have installed everything that I could and even specifically installed 4.40 to be sure that it would work properly; since it says that it does not support 4.41 or higher... I have included the system info details and the terminal output below. Any assistance or ideas on this would be extremely helpful. This is the repo: FardinHash/GPT-4o
transformers-cli env
transformers
version: 4.40.0root@WorkshopTest:/var/sftp/uploads/panelTest/fffffff# docker compose up [+] Running 1/0 ✔ Container fffffff-gpt4o-1 Created 0.0s Attaching to gpt4o-1 gpt4o-1 | Requirement already satisfied: flash-attn in /usr/local/lib/python3.12/site-packages (2.6.3) gpt4o-1 | Requirement already satisfied: torch in /usr/local/lib/python3.12/site-packages (from flash-attn) (2.4.1) gpt4o-1 | Requirement already satisfied: einops in /usr/local/lib/python3.12/site-packages (from flash-attn) (0.8.0) gpt4o-1 | Requirement already satisfied: filelock in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.15.4) gpt4o-1 | Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (4.12.2) gpt4o-1 | Requirement already satisfied: sympy in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (1.13.2) gpt4o-1 | Requirement already satisfied: networkx in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.3) gpt4o-1 | Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.1.4) gpt4o-1 | Requirement already satisfied: fsspec in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (2024.6.1) gpt4o-1 | Requirement already satisfied: setuptools in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (74.1.2) gpt4o-1 | Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (9.1.0.70) gpt4o-1 | Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.3.1) gpt4o-1 | Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (11.0.2.54) gpt4o-1 | Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (10.3.2.106) gpt4o-1 | Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (11.4.5.107) gpt4o-1 | Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.0.106) gpt4o-1 | Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (2.20.5) gpt4o-1 | Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (12.1.105) gpt4o-1 | Requirement already satisfied: triton==3.0.0 in /usr/local/lib/python3.12/site-packages (from torch->flash-attn) (3.0.0) gpt4o-1 | Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.12/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->flash-attn) (12.6.68) gpt4o-1 | Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/site-packages (from jinja2->torch->flash-attn) (2.1.5) gpt4o-1 | Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/site-packages (from sympy->torch->flash-attn) (1.3.0) gpt4o-1 | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. gpt4o-1 | Traceback (most recent call last): gpt4o-1 | File "/usr/src/app/app.py", line 4, in
gpt4o-1 | from bot import chatbot, model_inference, BOT_AVATAR, EXAMPLES, model_selector, decoding_strategy, temperature, max_new_tokens, repetition_penalty, top_p
gpt4o-1 | File "/usr/src/app/bot.py", line 31, in
gpt4o-1 | "idefics2-8b-chatty": Idefics2ForConditionalGeneration.from_pretrained(
gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3826, in from_pretrained
gpt4o-1 | config = cls._autoset_attn_implementation(
gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 1129, in _autoset_attn_implementation
gpt4o-1 | config = super()._autoset_attn_implementation(
gpt4o-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1556, in _autoset_attn_implementation
gpt4o-1 | cls._check_and_enable_flash_attn_2(
gpt4o-1 | File "/usr/local/lib/python3.12/site-packages/transformers/modeling_utils.py", line 1667, in _check_and_enable_flash_attn_2
gpt4o-1 | raise ImportError(f"{preface} Flash Attention 2 is not available. {install_message}")
gpt4o-1 | ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
gpt4o-1 exited with code 1
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Using the library GPT-4o from FardinHash here on GitHub, just running the 'docker compose up' command results in this output each time...
Expected behavior
Get the error output claiming that Flash Attention 2 must be installed; even though it already is.