mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
https://arxiv.org/abs/2211.10438
MIT License
1.1k stars 127 forks source link

bmm_s8t_s8n_s8t cannot run with this shape #74

Closed xiachong94 closed 3 months ago

xiachong94 commented 4 months ago

from torch_int._CUDA import bmm_s8t_s8n_s8t import torch bmm_s8t_s8n_s8t(torch.randint(-128, 127, (64,4,4), dtype=torch.int8).cuda(), torch.randint(-128, 127, (64, 64, 4), dtype=torch.int8).cuda(), 0.001)

Traceback (most recent call last): File "", line 1, in RuntimeError: cutlass cannot implement