philschmid / deep-learning-pytorch-huggingface

MIT License
580 stars 138 forks source link

flash attention error on instruction tune llama-2 tutorial on Sagemaker notebook #40

Open matthewchung74 opened 8 months ago

matthewchung74 commented 8 months ago

Thank you for the excellent Blogs!

When running https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/instruction-tune-llama-2-int4.ipynb

I am trying to enable flash attention in a Sagemaker Notebook using ml.g5.2xlarge and nvidia-smi tells me I am on CUDA Version: 12.0 but

os.environ["MAX_JOBS"] = "4" 
!pip install flash-attn --no-build-isolation

gives this error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      fatal: not a git repository (or any of the parent directories): .git

      torch.__version__  = 2.1.0+cu121

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-1b2ql47d/flash-attn_2180596c15514b7d9e4d004796412440/setup.py", line 117, in <module>
          raise RuntimeError(
      RuntimeError: FlashAttention is only supported on CUDA 11.6 and above.  Note: make sure nvcc has a supported version by running nvcc -V.
      [end of output]

Is this something you've seen?

philschmid commented 8 months ago

I am not sure if cuda 12.0 is yet supported. -> that's was the error says as well

matthewchung74 commented 8 months ago

I do see that, but when I do this

base) [ec2-user@ip-172-16-30-64 notebooks]$ nvidia-smi
Thu Oct 26 00:27:41 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |

it looks like coda 12.0 is installed