How do we use flash-attn?

DarkAlchy commented 1 week ago

Compiled it and there is no way, or is there, that this should be this slow. 2it/s on a 4090?

kijai commented 1 week ago

I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.

DarkAlchy commented 1 week ago

Alright, now the quality I just showed discord and I am stumped.

patientx commented 1 week ago

So this is why I was getting around 30-40 sec / it on rx 6600 :) we don't have a way to flash attention at all here so good luck to nvidia brethren. And yes quality is "meh" at best.

festivus37 commented 1 week ago

I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.

Man, I just can't seem to get it installed. Every time I try "pip install <pasted link to one of the whl files" it just keeps saying ERROR: flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.". I've tried them all. I'm on windows 11 with (2.3.1+cu121) torch.

kijai commented 1 week ago

I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.

Man, I just can't seem to get it installed. Every time I try "pip install <pasted link to one of the whl files" it just keeps saying ERROR: flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.". I've tried them all. I'm on windows 11 with (2.3.1+cu121) torch.

What's your Python version? That file is "cp39" which would be for Python 3.9.

festivus37 commented 1 week ago

I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.

Man, I just can't seem to get it installed. Every time I try "pip install <pasted link to one of the whl files" it just keeps saying ERROR: flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.". I've tried them all. I'm on windows 11 with (2.3.1+cu121) torch.

What's your Python version? That file is "cp39" which would be for Python 3.9.

I'm on Python 3.10.11 (windows). so that nugget you mentioned helped a lot. That said I tried every cp10 with the right cuda 1.21 that I had, and every one blew up my various comfy nodes (extra samplers etc), so eventually I gave up. I found a relatively fast workflow at 2.40:1 aspect ratio that renders in about 21 seconds with sdxl refiner. Thanks for the work on these nodes. Enjoy a picture. :) ComfyUI_temp_lpnci_00003_

diodiogod commented 1 week ago

How can I check what is my python environment in my ComfyUi installation to install flahs-attn?

sdk401 commented 1 week ago

Ok, stupid question, how do I know if I've installed correct flash-attn and it works? I've ran pip install "correct_wheel", it said intalled succesfully, but I'm running comfyui through stability matrix and it has it's own python, so I don't understand if I installed it to my global installation or to the local python in matrix.

Getting pretty long gen times, is there some message in the console which shows the mode it's working in? Can't see any errors.

Sampling time looks like this:

ODE Sampling: 100%|██████████| 48/48 [02:08<00:00, 2.67s/it]

2+ minutes for 25 steps on 1024x1024 image, on 8gb 3070. I'm not going into low vram mode, it sits at about 7.7gb of VRAM. Maybe it's working as intended? Hard to say. Pixart, sd3 and sdxl are sampling about 2-3 times faster for the same steps and resolution.

kijai commented 1 week ago

That does sound like it's not working, I'll add console message to make it clearer. I don't actually use stability matrix so I don't know how it has the python environment setup, it definitely doesn't use your global python though.

On Tue, 18 Jun 2024, 10:45 sdk401, @.***> wrote:

Ok, stupid question, how do I know if I've installed correct flash-attn and it works? I've ran pip install "correct_wheel", it said intalled succesfully, but I'm running comfyui through stability matrix and it has it's own python, so I don't understand if I installed it to my global installation or to the local python in matrix.

Getting pretty long gen times, but no errors, is there some message in the console which shows the mode it's working in? Don't see any errors.

Sampling time looks like this:

ODE Sampling: 100%|██████████| 48/48 [02:08<00:00, 2.67s/it]

2+ minutes for 25 steps on 1024x1024 image, on 8gb 3070. I'm not going into low ram mode, it sits at about 7.7gb of VRAM. Maybe it's working as intended? Hard to say. Pixart, sd3 and sdxl are sampling about 2-3 times faster for the same steps and resolution.

— Reply to this email directly, view it on GitHub https://github.com/kijai/ComfyUI-LuminaWrapper/issues/5#issuecomment-2175423010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXG5EZMYA3VHJGQFVJ33TTZH7QSDAVCNFSM6AAAAABJOWBPMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZVGQZDGMBRGA . You are receiving this because you commented.Message ID: @.***>

sdk401 commented 1 week ago

definitely doesn't use your global python though.

yeah, i've tried installing directly with python.exe in local python folder, it said "already intalled". trying one more time with "--force reinstall", I hope it will not break verything else :)

up: reinstalled, rebooted, still same sampling time. pip says "flash-attn is already installed with the same version as the provided wheel."

from the console:

** Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
pytorch version: 2.3.1+cu121

file i'm using:

flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp310-cp310-win_amd64.whl

sdk401 commented 1 week ago

Found the button to install the wheel directly in matrix interface, looks like it worked, time is down to 1+ minute, looks like it's as fast as I'll get on my gpu. Sorry for the comment spam, thanks a lot fot the node :)

ODE Sampling: 100%|██████████| 48/48 [01:13<00:00, 1.54s/it]

kijai commented 1 week ago

How can I check what is my python environment in my ComfyUi installation to install flahs-attn?

This depends what kind of comfy install you have, it generally should tell you in the console log when you start Comfy.

Manually without running comfy, if it's the portable you can run this to show pytorch version: python_embeded\python.exe -m pip show torch Python version: python_embeded\python.exe --version

ChibiChubu commented 1 week ago

image_2024-06-18_17-12-19

what i miss?

kijai commented 1 week ago

what i miss?

Probably missing one of the dependencies from the requirements.txt

DarkAlchy commented 1 week ago

Would I gain anything by going flash-attn2?

kijai commented 1 week ago

Would I gain anything by going flash-attn2?

As far as I understood "flash attention 2" is just flast_attn versioning, as in 2.5.9 is flash attention 2 already.

DarkAlchy commented 1 week ago

I only ask because I built it and some other package complained that it was 1.

kijai / ComfyUI-LuminaWrapper

How do we use flash-attn? #5