Torch was not compiled with flash attention

Sp4wnf3rk3l commented 3 months ago

Greetings!

I've got very slow generation times using tortoise and I get the following error constantly with nearly every tool in the whole UI:

D:\AI-Apps\tts-generation-webui-main\installer_files\env\lib\site-packages\torch\nn\functional.py:5504: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal) 2024-07-25 02:23:40 | ERROR | asyncio | Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)> Traceback (most recent call last): File "D:\AI-Apps\tts-generation-webui-main\installer_files\env\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, *self._args) File "D:\AI-Apps\tts-generation-webui-main\installer_files\env\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost self._sock.shutdown(socket.SHUT_RDWR) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Thank you for your work!

rsxdalv commented 3 months ago

Hi! I saw your other post as well, but I didn't have time to check the full context. A 4070 should have a decent generation time, though can't give you an exact number yet. The numbers you have suggested seem to be long. The flash attention is quite difficult to get to work (but not impossible). Also what you can do is try to use KV_Cache, it will change the quality but should speed things up. As for the CUDA etc, could you please copy the output from here:

Sp4wnf3rk3l commented 3 months ago

Hey o/ thx for your quick response and sry... didn't mean to rush you, just thought I should open another issue, in case other people experience the same problem. Really appreciate that you try to help everyone!

I did in fact solve the issue somehow after several reinstalls. Can't exactly tell what solved it however, but the problem could have been an existing local installation of CUDA interfering with the CUDA dependencies that come with pytorch. After uninstalling the local dependencies and reinstalling the webUI (I think) the error was gone and the generation times are just a fraction of what they were before.

Guess you won't need this anymore... but anyway:

Screenshot was taken before solving the problem, but the page looks exactly the same now.

rsxdalv commented 3 months ago

All good! I'm glad that it works, although this is quite bizarre if it both has CUDA and is unable to use it at the same time, hadn't heard of this before.

On Thu, Jul 25, 2024, 6:50 AM Sp4wnf3rk3l @.***> wrote:

Hey o/ thx for your quick response and sry... didn't ment to rush you, just thought I should open another issue, if other people experience the same problem. Really appreciate that you try to help everyone!

I did in fact solve the issue somehow after several reinstalls. Can't exactly tell what solved it however, but it could be an existing local installation of CUDA interfering with the CUDA dependencies that come with pytorch. After uninstalling the local dependencies and reinstalling the webUI (I think) the error was gone and the generation times are just a fraction of what they were before.

Guess you won't need this anymore... but anyway: Screenshot.2024-07-25.054521.png (view on web) https://github.com/user-attachments/assets/d1dad9f6-301e-4b49-9795-1c497c532ddf

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/350#issuecomment-2249313250, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI5ATO5IF4L6OKNZFTLZOBYW7AVCNFSM6AAAAABLNPXCXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBZGMYTGMRVGA . You are receiving this because you commented.Message ID: @.***>

rsxdalv / tts-generation-webui

Torch was not compiled with flash attention #350