usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Apache License 2.0
188 stars 15 forks source link

End-to-end performance for ViTs #11

Open shivmgg opened 2 months ago

shivmgg commented 2 months ago

Hi,

Thanks for the awesome work! I was wondering if it is possible to run FP6 quantized kernels for ViTs.