usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Apache License 2.0
188 stars 15 forks source link

How to support V100(sm70) #3

Closed fishelegs closed 3 months ago

fishelegs commented 7 months ago

Is it possible to support V100?

Summer-Summer commented 6 months ago

Theoretically, our design can be applied to any Tensor Core GPU, including Ampere, Hopper, and Volta. More engineering efforts are required to refactor the CUDA code, so that more GPU architectures are supported. I am planning to do it myself once I get more spare time.