usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Apache License 2.0
171 stars 14 forks source link

Does fp6 need a calibration dataset to tune with? #7

Closed leiwen83 closed 1 month ago

leiwen83 commented 4 months ago

And have fp6 compare the accuracy with fp8? I haven't find any related in the paper...

Summer-Summer commented 4 months ago

Yeah, we only compared the accuracy to FP16 and the accuracy loss is already negligible. We did not further compare it to FP8 because FP8 would not be better than FP16.

Summer-Summer commented 4 months ago

For FP6, we can do post-training quantization. I think we do not need a calibration dataset. Please refer to https://arxiv.org/abs/2312.08583 for more algorithm level details.