Open DarkAlchy opened 1 week ago
I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.
Alright, now the quality I just showed discord and I am stumped.
So this is why I was getting around 30-40 sec / it on rx 6600 :) we don't have a way to flash attention at all here so good luck to nvidia brethren. And yes quality is "meh" at best.
I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.
Man, I just can't seem to get it installed. Every time I try "pip install <pasted link to one of the whl files" it just keeps saying ERROR: flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.". I've tried them all. I'm on windows 11 with (2.3.1+cu121) torch.
I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.
Man, I just can't seem to get it installed. Every time I try "pip install <pasted link to one of the whl files" it just keeps saying ERROR: flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.". I've tried them all. I'm on windows 11 with (2.3.1+cu121) torch.
What's your Python version? That file is "cp39" which would be for Python 3.9.
I'm getting 2.92 it/s at 1024x1024 with 4090 when using flash attention, so yeah it's bit slow. Without it you'd be looking at seconds per iteration instead, so it does seem to be working if you are using a higher resolution.
Man, I just can't seem to get it installed. Every time I try "pip install <pasted link to one of the whl files" it just keeps saying ERROR: flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.". I've tried them all. I'm on windows 11 with (2.3.1+cu121) torch.
What's your Python version? That file is "cp39" which would be for Python 3.9.
I'm on Python 3.10.11 (windows). so that nugget you mentioned helped a lot. That said I tried every cp10 with the right cuda 1.21 that I had, and every one blew up my various comfy nodes (extra samplers etc), so eventually I gave up. I found a relatively fast workflow at 2.40:1 aspect ratio that renders in about 21 seconds with sdxl refiner. Thanks for the work on these nodes. Enjoy a picture. :)
How can I check what is my python environment in my ComfyUi installation to install flahs-attn?
Ok, stupid question, how do I know if I've installed correct flash-attn and it works? I've ran pip install "correct_wheel", it said intalled succesfully, but I'm running comfyui through stability matrix and it has it's own python, so I don't understand if I installed it to my global installation or to the local python in matrix.
Getting pretty long gen times, is there some message in the console which shows the mode it's working in? Can't see any errors.
Sampling time looks like this:
ODE Sampling: 100%|██████████| 48/48 [02:08<00:00, 2.67s/it]
2+ minutes for 25 steps on 1024x1024 image, on 8gb 3070. I'm not going into low vram mode, it sits at about 7.7gb of VRAM. Maybe it's working as intended? Hard to say. Pixart, sd3 and sdxl are sampling about 2-3 times faster for the same steps and resolution.
That does sound like it's not working, I'll add console message to make it clearer. I don't actually use stability matrix so I don't know how it has the python environment setup, it definitely doesn't use your global python though.
On Tue, 18 Jun 2024, 10:45 sdk401, @.***> wrote:
Ok, stupid question, how do I know if I've installed correct flash-attn and it works? I've ran pip install "correct_wheel", it said intalled succesfully, but I'm running comfyui through stability matrix and it has it's own python, so I don't understand if I installed it to my global installation or to the local python in matrix.
Getting pretty long gen times, but no errors, is there some message in the console which shows the mode it's working in? Don't see any errors.
Sampling time looks like this:
ODE Sampling: 100%|██████████| 48/48 [02:08<00:00, 2.67s/it]
2+ minutes for 25 steps on 1024x1024 image, on 8gb 3070. I'm not going into low ram mode, it sits at about 7.7gb of VRAM. Maybe it's working as intended? Hard to say. Pixart, sd3 and sdxl are sampling about 2-3 times faster for the same steps and resolution.
— Reply to this email directly, view it on GitHub https://github.com/kijai/ComfyUI-LuminaWrapper/issues/5#issuecomment-2175423010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXG5EZMYA3VHJGQFVJ33TTZH7QSDAVCNFSM6AAAAABJOWBPMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZVGQZDGMBRGA . You are receiving this because you commented.Message ID: @.***>
definitely doesn't use your global python though.
yeah, i've tried installing directly with python.exe in local python folder, it said "already intalled". trying one more time with "--force reinstall", I hope it will not break verything else :)
up: reinstalled, rebooted, still same sampling time. pip says "flash-attn is already installed with the same version as the provided wheel."
from the console:
** Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
pytorch version: 2.3.1+cu121
file i'm using:
flash_attn-2.5.9.post1+cu122torch2.3.1cxx11abiFALSE-cp310-cp310-win_amd64.whl
Found the button to install the wheel directly in matrix interface, looks like it worked, time is down to 1+ minute, looks like it's as fast as I'll get on my gpu. Sorry for the comment spam, thanks a lot fot the node :)
ODE Sampling: 100%|██████████| 48/48 [01:13<00:00, 1.54s/it]
How can I check what is my python environment in my ComfyUi installation to install flahs-attn?
This depends what kind of comfy install you have, it generally should tell you in the console log when you start Comfy.
Manually without running comfy, if it's the portable you can run this to show pytorch version:
python_embeded\python.exe -m pip show torch
Python version:
python_embeded\python.exe --version
what i miss?
what i miss?
Probably missing one of the dependencies from the requirements.txt
Would I gain anything by going flash-attn2?
Would I gain anything by going flash-attn2?
As far as I understood "flash attention 2" is just flast_attn versioning, as in 2.5.9 is flash attention 2 already.
I only ask because I built it and some other package complained that it was 1.
Compiled it and there is no way, or is there, that this should be this slow. 2it/s on a 4090?