spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
https://arxiv.org/abs/2404.00456
Apache License 2.0
278 stars 20 forks source link

H100 Support #48

Open carlguo866 opened 5 days ago

carlguo866 commented 5 days ago

Does code in this repo support H100? I'm getting this error when trying to run it on an H100:

"/home/carlguo/QuaRot/quarot/nn/linear.py", line 50, in forward
    x = quarot.matmul(x, self.weight)
  File "/home/carlguo/QuaRot/quarot/__init__.py", line 41, in matmul
    return quarot._CUDA.matmul(A, B).view(*A_shape_excl_last, *B_shape_excl_last)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
sashkboos commented 5 days ago

Thanks @carlguo866 for your issue.

Unfortunately, the code for H100 is not yet supported. This is mostly due to our GEMM kernel which we use CUTLASS and we compiled it for older architectures (for example, if you replace our matmul calls with what is running on H100, the rest of the codebase works fine). We hope to support soon :)