Open mgoin opened 4 months ago
@Summer-Summer any help here would be appreciated please
Sorry for the inconvenience. The generation of quantization scales is part of the model quantization process, and I believe that you can find the related code here.
I will add that API to this repo when I have more spare time.
All of the FP6 gemm functions take the FP6 weights and their FP16 scales for each output channel
We have functions for converting FP16 weights to FP6 (
weight_prepacking_fp16_to_fp6
) and for packing the FP6 weights into the final inference format (weight_matrix_prepacking
), but nothing to generate the scales to up-convert back to FP16.In the testing code for either python or c++ the scales are always randomly initialized. Is there a function that generates the scales needed for accurate dequantization with real weights?