pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.35k stars 484 forks source link

RuntimeError: CUDA error: named symbol not found #87

Open ce1190222 opened 5 months ago

ce1190222 commented 5 months ago

I am trying to quantize the llama-2-7b-chat-hf using the gpt fast using:- python quantize.py --mode int4 --groupsize 32 on Kaggle using Kaggle T4*2 GPU. I have installed pytorch nightly using:- pip install torch==2.3.0.dev20240117+cu121 --index-url https://download.pytorch.org/whl/nightly/cu121

I had even tried changing dtype from torch.bfloat16 to torch.flat32. But got the same error again.

However, I get this error message:-

Loading model ... Quantizing model weights for int4 weight-only affine per-channel groupwise quantization linear: layers.0.attention.wqkv, in=4096, out=12288 linear: layers.0.attention.wo, in=4096, out=4096 Traceback (most recent call last): File "/kaggle/working/quantize.py", line 605, in quantize(args.checkpoint_path, args.mode, args.groupsize, args.calibration_tasks, args.calibration_limit, args.calibration_seq_length, args.pad_calibration_inputs, args.percdamp, args.blocksize, args.label) File "/kaggle/working/quantize.py", line 552, in quantize quantized_state_dict = quant_handler.create_quantized_state_dict() File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/kaggle/working/quantize.py", line 416, in create_quantized_state_dict weight_int4pack, scales_and_zeros = prepare_int4_weight_and_scales_and_zeros( File "/kaggle/working/quantize.py", line 348, in prepare_int4_weight_and_scales_and_zeros weight_int32, scales_and_zeros = group_quantize_tensor( File "/kaggle/working/quantize.py", line 131, in group_quantize_tensor scales, zeros = get_group_qparams(w, n_bit, groupsize) File "/kaggle/working/quantize.py", line 66, in get_group_qparams assert torch.isnan(to_quant).sum() == 0 RuntimeError: CUDA error: named symbol not found CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

kaizizzzzzz commented 4 months ago

Did u figure out this question? I also has this question