huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
645 stars 36 forks source link

feat(cuda): compile according to capabilities #209

Closed dacorvo closed 2 weeks ago

dacorvo commented 2 weeks ago

The __CUDA_ARCH__ preprocessor variable is not exported when compiling CUDA code for the host, but only for the device. This means that we need another preprocessor variable to decide whether we compile the AWQ kernels.