ptxas fatal : Ptx assembly aborted due to errors

abuzurkhanov commented 9 months ago

RuntimeError: Internal Triton PTX codegen error: ptxas /tmp/compile-ptx-src-38da7f, line 91; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 91; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 92; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 92; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 102; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 104; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 107; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 107; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 108; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 108; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 118; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 120; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 129; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 129; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 131; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 131; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 140; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 140; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 142; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 142; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 158; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 158; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 160; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 160; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 168; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 168; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 170; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-38da7f, line 170; error : Feature 'cvt with .bf16' requires .target sm_80 or higher ptxas fatal : Ptx assembly aborted due to errors

ikergarcia1996 commented 9 months ago

Hi @abuzurkhanov!

Which GPU are you using? It seems that your GPU doesn't support the bfloat16 format. If this is the case, you need to load the model with torch_dtype="float32"

abuzurkhanov commented 9 months ago

I use 2 gtx 1060 and RTX 2080 Ti

osainz59 commented 9 months ago

Hi @abuzurkhanov !

Can you please run the following command? It will check whether your hardware is compatible with bfloat16 or not :)

import torch

torch.cuda.is_bf16_supported()

ikergarcia1996 commented 9 months ago

I use 2 gtx 1060 and RTX 2080 Ti

The GTX1060 doesn't support bfloat16. Here are some workarounds:

Load the model with torch dtype=float32.
Set force_auto_device_map=False, because it will split the model across all available devices. Then load the model only to the 2080Ti using model.to('cuda:2'), where 2 should be replaced with the device id of the 2080Ti.
Set the environment variable using export CUDA_VISIBLE_DEVICES=2 (replace 2 with the device id of the 2080Ti) so PyTorch ignores both GTX1060s.

jmanhype commented 9 months ago

I should have responded much sooner im having same issue. I have a 2080 super which is sm 80 or higher. Still trigfering this error. Also as mentioned in my last post f32 wont work for me cause of ram issues inhave 32gb pf ram

ikergarcia1996 commented 9 months ago

Hi @jmanhype

I am not sure about the 2080S supporting bfloat16. The V100, which shares similar tensor cores does not. Please, be sure that you are running the latest Nvidia drivers and Pytorch version and run the following command:

import torch

torch.cuda.is_bf16_supported()

If Bfloat16 is not supported and you cannot load the model in float32 another workaround is changing the bnb_4bit_compute_dtype in the Line 329 of load_model.py: https://github.com/hitz-zentroa/GoLLIE/blob/main/src/model/load_model.py#L329

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float32, #torch.float16 # <--------- Use float32 or float16
        )

This will keep the 4 bit quantization, but the non-quantified weighs will be in float32 instead of bfloat16. I am not sure if float16 would work. You can also test 8-bit quantization instead of 4-bit quantization if the model fits in GPU memory.

Please note that we have not tested this solution. We used bfloat16 for our experiments, and the base model, CodeLLama, is also distributed in bfloat16 format. Therefore, it might result in the model performing poorly due to numerical instability. But it is worth a try if your hardware doesn't support bfloat16.

If you manage to run the model with these settings, it would be great if you could share your findings. This would help us assist others with similar issues in the future :D

jmanhype commented 9 months ago

Now it needs a offload folder and its saying that bcertain varibales like the offload folder isnt recognized

hitz-zentroa / GoLLIE

ptxas fatal : Ptx assembly aborted due to errors #5