Closed mscherrmann closed 11 months ago
I believe that triton flash attention will not work on P100s. Could you try uninstalling flash_attn_triton
before running anything? I think then it will fall back to torch attention properly instead of trying to use flash attention and failing.
Thank you for your quick response! Unfortunately I do not have a flash_attn_triton
package installed. I only find flash_attn
, but uninstalling it doesnt help.
Apologies, I think I got the package wrong, and it's actually triton
you want to uninstall. flash_attn_triton
is a file in our repo. We have a try/catch around importing it, which would disable the triton attention implementation, but I guess for you the import succeeds and then it fails when it starts actually running. So I want to make that import fail so that triton is disabled.
Did also not work for me unfortunately. However, I just switched to pretrain hf-bert, that works fine.
Thank you for your help!
Hey,
as finetuning after the import to transformers is not possible, I tried the finetuning script that you provide. I tried to run the function 'test_classification_script()' from 'tests/test_classification.py' as a first step to test your finetuning framework. To do so, I used a linux server with ubuntu and with 4 x NVIDIA Tesla P100 (16 GB). For the setup, I followed all the steps that you recommend here, i.e.:
I have installed the cuda release 117, as the following output suggests:
'nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0'
To test your finetuning script, I simply did the following in the console:
Here is the complete output:
Note that I replaced in the output above the paths with my personal information by (...).
Also note that the commands
composer sequence_classification.py yamls/test/sequence_classification.yaml
composer sequence_classification.py yamls/test/sequence_classification.yaml model.name=mosaic_bert
yield the same error message.
Did I something wrong or is this an error in the code? I would be incredibly grateful for any guidance as I urgently need to fine-tune my model, but unfortunately, I'm currently facing the mentioned challenges that are preventing me from doing so.
Thank you very much!