kijai / ComfyUI-Florence2

Inference Microsoft Florence2 VLM
MIT License
642 stars 40 forks source link

No module named 'flash_attn_2_cuda' #23

Open 1-eyx opened 3 months ago

1-eyx commented 3 months ago

Error occurred when executing DownloadAndLoadFlorence2Model:

No module named 'flash_attn_2_cuda'

the error happens with all precision settings and all attention settings

(The model is: Florence-2-base)

Screenshot (454)

kijai commented 3 months ago

Never saw that before, but probably due to old transfomers version.

1-eyx commented 3 months ago

I update it to the latest version and yeah, the same error.

1-eyx commented 2 months ago

Where are you man!

kijai commented 2 months ago

I can't reproduce the error so dunno how I can help.

Are you sure you updated transformers for the portable install specifically? As in going to the ComfyUI_windows_portable\python_embeded -folder and running:

python.exe -m pip install -U transformers

You can check your current version with:

python.exe -m pip show transformers

ritikvirus commented 2 months ago

Steps to Fix the Error

  1. Uninstall Flash-Attention:

    • Open your terminal.
    • Run the following command to uninstall the existing flash-attn package:
      pip uninstall flash-attn -y
  2. Clone the Flash-Attention Repository:

    • Use the following command to clone the Flash-Attention repository from GitHub:
      git clone https://github.com/Dao-AILab/flash-attention.git
  3. Navigate to the Cloned Repository:

    • Change your current directory to the flash-attention directory using:
      cd flash-attention
  4. Install Flash-Attention:

    • Run the following command to install the flash-attn package without build isolation:
      pip install flash-attn . --no-build-isolation

Following these steps should resolve the error. After completing them, your setup should work without issues.

gabriel-filincowsky commented 1 month ago
ip uni

Is there any specific directory I need to run these commands?

kijai commented 1 month ago

Flash attention itself is not needed to run Florence2 as long as you use sdpa or esfwr as attention mode. Make sure everything else is up to date, especially torch and transformers. Never seen the error myself so that's all I have.

gabriel-filincowsky commented 1 month ago

Flash attention itself is not needed to run Florence2 as long as you use sdpa or esfwr as attention mode. Make sure everything else is up to date, especially torch and transformers. Never seen the error myself so that's all I have.

Hey Kijai, thank you very much for all your work and for taking the time to reply. I was wondering if running it in 'flash_attn_2' would improve my performance, but the node is running fine in 'sdpa'.

kijai commented 1 month ago

I didn't notice any real difference myself, it's already so fast.