Open shhn1 opened 4 months ago
This does not need any training or fintuning. Low rank matrixs are generated by singular value decomposition algorithms.
Thanks for your kind reply! Since I just started to learn about quantization recently. May I ask if the quantization error is obtained using an offline calibration dataset? When running inference, will a fixed low-rank approximation matrix be added to the quantized kv cache?
In addition, I saw that there are lowrank-related functions in this script [fake_svd_lowrank](https://github.com/opengear-project/GEAR/blob/b4f14ce6678240a2e7f828d3c4a268d719b5ee7d/GEARLM/GEARLM/Simulated/compress_function.py#L202)
, but I did not find its use in the llama-related code. Could you tell me how it is used?
I would be very grateful if you could reply! :)
This does not need any training or fintuning. Low rank matrixs are generated by singular value decomposition algorithms.
Thanks for your kind reply! Since I just started to learn about quantization recently. May I ask if the quantization error is obtained using an offline calibration dataset? When running inference, will a fixed low-rank approximation matrix be added to the quantized kv cache?
In addition, I saw that there are lowrank-related functions in this script
[fake_svd_lowrank](https://github.com/opengear-project/GEAR/blob/b4f14ce6678240a2e7f828d3c4a268d719b5ee7d/GEARLM/GEARLM/Simulated/compress_function.py#L202)
, but I did not find its use in the llama-related code. Could you tell me how it is used?I would be very grateful if you could reply! :)
This does not need any training or fintuning. Low rank matrixs are generated by singular value decomposition algorithms.
Quantiztization error is calculated during the quantization process.
Thanks for your great work!
In the paper, after the kv cache is quantized, a low-rank matrix is used to approximate the quantization error. I really want to know if this process needs training? Since I can't find a usage guide, could you please tell me where the specific usage details in the code are?