PyTorch native quantization and sparsity for training and inference
BSD 3-Clause "New" or "Revised" License
1.62k
stars
179
forks
source link
RuntimeError: CUDA error: named symbol not found CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #968
from transformers import TorchAoConfig, AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "IlyaGusev/saiga_llama3_8b"
quantization_config = TorchAoConfig("int4_weight_only", group_size=128)
quantized_model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda", quantization_config=quantization_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "What are we having for dinner?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# compile the quantized model to get speedup
import torchao
torchao.quantization.utils.recommended_inductor_config_setter()
quantized_model = torch.compile(quantized_model, mode="max-autotune")
output = quantized_model.generate(**input_ids, max_new_tokens=10)
print(tokenizer.decode(output[0], skip_special_tokens=True))
And got the following:
File ~/anaconda3/envs/LLMs/lib/python3.12/site-packages/torchao/quantization/utils.py:322, in pack_tinygemm_scales_and_zeros(scales, zeros, dtype)
319 guard_dtype_size(scales, "scales", dtype=dtype, size=zeros.size())
320 guard_dtype_size(zeros, "zeros", dtype=dtype)
321 return (
--> 322 torch.cat(
323 [
324 scales.reshape(scales.size(0), scales.size(1), 1),
325 zeros.reshape(zeros.size(0), zeros.size(1), 1),
326 ],
327 2,
328 )
329 .transpose(0, 1)
330 .contiguous()
331 )
RuntimeError: CUDA error: named symbol not found
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
The pack_tinygemm_scales_and_zeros function looks like this:
I tried to execute the following code:
And got the following:
The pack_tinygemm_scales_and_zeros function looks like this:
GPU: NVIDIA GTX 1060 3GB CUDA: 12.1 NVIDIA-SMI 530.30.02
Driver Version: 530.30.02
System: Host: linuxhome-desktop Kernel: 5.15.0-56-generic x86_64 bits: 64 Desktop: Cinnamon 5.6.5 Distro: Linux Mint 21.1 Vera
I attribute this error to the fact that my GPU does not support bfloat16, but what do you think?