UserWarning: 1Torch was not compiled with flash attention.

capactiyvirus commented 5 months ago

Describe the bug

When i load my model and try to user it get a error

13:41:11-717356 INFO Saved "I:\programming\text-generation-webui\presets\My Preset.yaml". I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = torch.nn.functional.scaled_dot_product_attention(

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Load model, https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5

Try to use it.

Get error

Screenshot

Logs

13:40:23-395560 INFO     Loaded the model in 77.23 seconds.
13:41:11-717356 INFO     Saved "I:\programming\text-generation-webui\presets\My Preset.yaml".
I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

System Info

nvidia rtx 2080ti 
Windows 11 Home Version 10.0.22631 Build 22631
System Model    X570 AORUS PRO WIFI
System Type x64-based PC
Processor   AMD Ryzen 7 5800X 8-Core Processor, 3801 Mhz, 8 Core(s), 16 Logical Processor(s)

capactiyvirus commented 5 months ago

did u forget to put the python install commands for pip ? you just have print statements. The main reason im asking is cuz i checked if pytorch is installed and it didnt seem so so im a bit confuse. Maybe i have a scuffed python env

capactiyvirus commented 5 months ago

after going to https://pytorch.org/ and running their installer

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

I found my python had torch installed

capactiyvirus commented 5 months ago

well updating pytorch gave me more errors lol

14:19:03-815672 ERROR    Could not find the character "[]" inside characters/. No
                         character has been loaded.
Traceback (most recent call last):
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\modules\chat.py", line 664, in load_character
    raise ValueError
ValueError
I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

oldgithubman commented 5 months ago

Same problem here. Seems like a regression

capactiyvirus commented 5 months ago

im not to familiar with machine learning in general so i cant really be much help with that :(. Do u know if we might need some kind of different packages ?

oldgithubman commented 5 months ago

It used to work. Some dev broke something. We need them to fix it. As you've already discovered, trying to fix it yourself just breaks more stuff

hugs7 commented 5 months ago

I'm seeing this warning too. Model seems to run despite it.

oldgithubman commented 5 months ago

Seems slow to me though. You?

VishalV1807 commented 5 months ago

I was using llama-2-7b-chat-hf for a project on my RTX 4050 and I get the same warning. The response also takes 1 hour to generate.

hugs7 commented 5 months ago

An hour seems far too long for a response. Are you using a pipeline to evaluate?

oldgithubman commented 5 months ago

Well, not having flash attention makes a big difference, especially in memory-constrained scenarios. People need to stop rushing releases. I've already switched to ollama and will evaluate llm studio today probably

capactiyvirus commented 5 months ago

^ ty @oldmanjk

tildebyte commented 5 months ago

If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA at all), see https://github.com/pytorch/pytorch/issues/108175

oldgithubman commented 5 months ago

If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA at all), see https://github.com/pytorch/pytorch/issues/108175

I'm on linux stable. No flash attention

aphex3k commented 4 months ago

Same warning for Llama-2-13b-chat-hf.

D:\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:670: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Output generated in 254.98 seconds (0.70 tokens/s, 178 tokens, context 78, seed 2082798633)

ADAning commented 4 months ago

I have the same problem with Qwen 1.5 on Windows. I found that regardless of whether or not flash-attn is installed with the corresponding version of pytorch, I don't have this problem when using torch=2.1.

When using torch=2.2, LLM inference gives the following warning:

D:\Project\AIGC\temp\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py:693: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Output generated in 29.86 seconds (0.87 tokens/s, 26 tokens, context 59, seed 1812789762)

After installing torch version 2.1, the problem disappeared:

conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

Inference speed:

Output generated in 29.40 seconds (15.00 tokens/s, 441 tokens, context 113, seed 601091263)

wertstahl commented 4 months ago

If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.

Akira13641 commented 4 months ago

If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.

The error is because ComfyUI just updated the dependencies past 2.1.2+cu121 a couple months ago basically without seeming to take into account that it would be guaranteed to cause this error in all cases on Windows, since Flash Attentiom was present in that version and lower but simply isn't anymore for unclear reasons. It's not "a lot of reading". You just have to manually reinstall specifically 2.1.2+cu121, which is the last version where Flash Attention existed in any way on Windows.

Urammar commented 3 months ago

If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.

The error is because ComfyUI just updated the dependencies past 2.1.2+cu121 a couple months ago basically without seeming to take into account that it would be guaranteed to cause this error in all cases on Windows, since Flash Attentiom was present in that version and lower but simply isn't anymore for unclear reasons. It's not "a lot of reading". You just have to manually reinstall specifically 2.1.2+cu121, which is the last version where Flash Attention existed in any way on Windows.

Do you have the commands to do that? Im not sure what the torchvision version should be in that case?

PZAragon commented 3 months ago

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121 But then XFormers will complain because the latest version requires Pytorch 2.3.0... I am not sure if I want to go find what version xformers works with pytorch 2.1.2...

Might try to find how to get a flash added to the Pytorch 2.3

martoonz commented 1 month ago

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121 But then XFormers will complain because the latest version requires Pytorch 2.3.0... I am not sure if I want to go find what version xformers works with pytorch 2.1.2...

Might try to find how to get a flash added to the Pytorch 2.3

Thanks for the help! for the xformers version matters i downgraded into 0.0.22.post4 pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121

Tyomanator commented 3 weeks ago

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

for the xformers version matters i downgraded into 0.0.22.post4 pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121

Sorry to ask a newbie question, but could you please kindly tell me on which folders exactly in ComfyUI you performed those install commands?

martoonz commented 2 weeks ago

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

for the xformers version matters i downgraded into 0.0.22.post4 pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121

Sorry to ask a newbie question, but could you please kindly tell me on which folders exactly in ComfyUI you performed those install commands?

run cmd in your, example : "X:\ComfyUI_windows_portable\python_embeded" folder then paste the commands inside the command prompt window.

In addition in cmd you can check package dependencies version for your personal packages version. to the pip show [package name] command, there is pipdeptree. Just do $ pip install pipdeptree then run $ pipdeptree

oobabooga / text-generation-webui