Open artbymbesares opened 4 months ago
I'm afraid that GPU just is not supported by flash attention.
Hm... I saw in the FlashAttention repo that at least v1 supports Turing GPUs, but I'm not sure if Lumina can use v1, can it?
flash_attn not working also for me...
Hm... I saw in the FlashAttention repo that at least v1 supports Turing GPUs, but I'm not sure if Lumina can use v1, can it?
Maybe you can use Xformers instead of flash_attn
Sadly removing FlashAttention makes it OOM, too heavy for my VRAM.
Odd, since I can run PixArt and HunYuan fine, albeit slow.
I'm afraid that GPU just is not supported by flash attention.
Maybe you can use Xformers instead of flash_attn
Would need GTX 30XX and 40XX
got prompt [rgthree] Using rgthree's optimized recursive execution. [rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute. [rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI. Gemma attention mode: flash_attention_2 @ ]
00007FFA6176311A00007FFA617630C0 c10.dll!c10::detail::torchCheckFail [ @ ]
00007FF920E060D000007FF920DF5B10 flash_attn_2_cuda.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [ @ ]
00007FF920E17AE000007FF920E0C5D0 flash_attn_2_cuda.cp310-win_amd64.pyd!PyInit_flash_attn_2_cuda [ @ ]
00007FF920E15C6E00007FF920E0C5D0 flash_attn_2_cuda.cp310-win_amd64.pyd!PyInit_flash_attn_2_cuda [ @ ]
00007FF920E15D6400007FF920E0C5D0 flash_attn_2_cuda.cp310-win_amd64.pyd!PyInit_flash_attn_2_cuda [ @ ]
00007FF920DFE36500007FF920DF5B10 flash_attn_2_cuda.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [ @ ]
00007FFA60509EEA00007FFA60509E18 python310.dll!PyObject_IsTrue [ @ ]
00007FFA6056989200007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6056BACD00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6054BFB000007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BD9300007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA5687F92100007FFA5686C540 torch_python.dll!THPPointer::THPPointer [ @ ]
00007FFA60509F1900007FFA60509E18 python310.dll!PyObject_IsTrue [ @ ]
00007FFA6054BDCE00007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6054BECB00007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6056729300007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6056BACD00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056862000007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6051A91700007FFA6051A844 python310.dll!PyObject_FastCallDictTstate [ @ ]
00007FFA606281F400007FFA60628178 python310.dll!PyObject_Call_Prepend [ @ ]
00007FFA6062815000007FFA60627064 python310.dll!PyBytesWriter_Resize [ @ ]
00007FFA6054FFBB00007FFA6054FE58 python310.dll!PyObject_MakeTpCall [ @ ]
00007FFA6056C39F00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6051A91700007FFA6051A844 python310.dll!PyObject_FastCallDictTstate [ @ ]
00007FFA606281F400007FFA60628178 python310.dll!PyObject_Call_Prepend [ @ ]
00007FFA6062815000007FFA60627064 python310.dll!PyBytesWriter_Resize [ @ ]
00007FFA6054FFBB00007FFA6054FE58 python310.dll!PyObject_MakeTpCall [ @ ]
00007FFA6056C39F00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6051A91700007FFA6051A844 python310.dll!PyObject_FastCallDictTstate [ @ ]
00007FFA606281F400007FFA60628178 python310.dll!PyObject_Call_Prepend [ @ ]
00007FFA6062815000007FFA60627064 python310.dll!PyBytesWriter_Resize [ @ ]
config.hidden_act
is ignored, you should useconfig.hidden_activation
instead. Gemma's activation function will be set togelu_pytorch_tanh
. Please, useconfig.hidden_activation
if you want to override this behaviour. See https://github.com/huggingface/transformers/pull/29402 for more details. Loading checkpoint shards: 100%|██████████| 2/2 [00:20<00:00, 10.46s/it] !!! Exception during processing!!! FlashAttention only supports Ampere GPUs or newer. Exception raised from mha_varlen_fwd at D:\a\flash-attention\flash-attention\csrc\flash_attn\flash_api.cpp:524 (most recent call first): 00007FFA6176366200007FFA61763600 c10.dll!c10::Error::Error [Traceback (most recent call last): File "E:\Data\Packages\ComfyUI_4\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "E:\Data\Packages\ComfyUI_4\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "E:\Data\Packages\ComfyUI_4\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "E:\Data\Packages\ComfyUI_4\custom_nodes\ComfyUI-LuminaWrapper\nodes.py", line 203, in encode prompt_embeds = text_encoder( File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 902, in forward layer_outputs = decoder_layer( File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 638, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 392, in forward attn_output = self._flash_attention_forward( File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\transformers\models\gemma\modeling_gemma.py", line 442, in _flash_attention_forward attn_output_unpad = flash_attn_varlen_func( File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\flash_attn\flash_attn_interface.py", line 1066, in flash_attn_varlen_func return FlashAttnVarlenFunc.apply( File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\torch\autograd\function.py", line 598, in apply return super().apply(args, **kwargs) # type: ignore[misc] File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\flash_attn\flash_attn_interface.py", line 581, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward( File "E:\Data\Packages\ComfyUI_4\venv\lib\site-packages\flash_attn\flash_attn_interface.py", line 86, in _flash_attn_varlen_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd( RuntimeError: FlashAttention only supports Ampere GPUs or newer. Exception raised from mha_varlen_fwd at D:\a\flash-attention\flash-attention\csrc\flash_attn\flash_api.cpp:524 (most recent call first): 00007FFA6176366200007FFA61763600 c10.dll!c10::Error::Error [ @ ]
00007FFA6176311A00007FFA617630C0 c10.dll!c10::detail::torchCheckFail [ @ ]
00007FF920E060D000007FF920DF5B10 flash_attn_2_cuda.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [ @ ]
00007FF920E17AE000007FF920E0C5D0 flash_attn_2_cuda.cp310-win_amd64.pyd!PyInit_flash_attn_2_cuda [ @ ]
00007FF920E15C6E00007FF920E0C5D0 flash_attn_2_cuda.cp310-win_amd64.pyd!PyInit_flash_attn_2_cuda [ @ ]
00007FF920E15D6400007FF920E0C5D0 flash_attn_2_cuda.cp310-win_amd64.pyd!PyInit_flash_attn_2_cuda [ @ ]
00007FF920DFE36500007FF920DF5B10 flash_attn_2_cuda.cp310-win_amd64.pyd!c10::ivalue::Object::operator= [ @ ]
00007FFA60509EEA00007FFA60509E18 python310.dll!PyObject_IsTrue [ @ ]
00007FFA6056989200007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6056BACD00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6054BFB000007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BD9300007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA5687F92100007FFA5686C540 torch_python.dll!THPPointer::THPPointer [ @ ]
00007FFA60509F1900007FFA60509E18 python310.dll!PyObject_IsTrue [ @ ]
00007FFA6054BDCE00007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6054BECB00007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6056729300007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6056BACD00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056862000007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6051A91700007FFA6051A844 python310.dll!PyObject_FastCallDictTstate [ @ ]
00007FFA606281F400007FFA60628178 python310.dll!PyObject_Call_Prepend [ @ ]
00007FFA6062815000007FFA60627064 python310.dll!PyBytesWriter_Resize [ @ ]
00007FFA6054FFBB00007FFA6054FE58 python310.dll!PyObject_MakeTpCall [ @ ]
00007FFA6056C39F00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6051A91700007FFA6051A844 python310.dll!PyObject_FastCallDictTstate [ @ ]
00007FFA606281F400007FFA60628178 python310.dll!PyObject_Call_Prepend [ @ ]
00007FFA6062815000007FFA60627064 python310.dll!PyBytesWriter_Resize [ @ ]
00007FFA6054FFBB00007FFA6054FE58 python310.dll!PyObject_MakeTpCall [ @ ]
00007FFA6056C39F00007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA6056361500007FFA605625C0 python310.dll!PyObject_GC_Malloc [ @ ]
00007FFA6054C00C00007FFA6054BF54 python310.dll!PyVectorcall_Call [ @ ]
00007FFA6054BE8700007FFA6054BD44 python310.dll!PyObject_Call [ @ ]
00007FFA6056B5E700007FFA605658A0 python310.dll!PyEval_EvalFrameDefault [ @ ]
00007FFA605649D700007FFA60564950 python310.dll!PyFunction_Vectorcall [ @ ]
00007FFA6051A91700007FFA6051A844 python310.dll!PyObject_FastCallDictTstate [ @ ]
00007FFA606281F400007FFA60628178 python310.dll!PyObject_Call_Prepend [ @ ]
00007FFA6062815000007FFA60627064 python310.dll!PyBytesWriter_Resize [ @ ]
Prompt executed in 55.16 seconds
I don't know what to do. :( I'm using:
Windows 10 GTX 1660ti Stability Matrix with Python 3.10 Pytorch 2.3.1+cu121