Quantization Investigation

qdrant / fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding

https://qdrant.github.io/fastembed/

Apache License 2.0

1.31k stars 94 forks source link

Open NirantK opened 6 months ago

NirantK commented 6 months ago

Consider this model from Xenova, there is a quantized model which is 120M instead of the 440-450M which I get from O3 quantization from Optimum.

Compare if the quantized model is as good as the 450M, O3 with an atol of 1e-3 and an O2 of 1e-4 — or there is something else happening there?

NirantK commented 6 months ago

NirantK commented 6 months ago

From @Xenova, this script traverses the graph and collects operators for quantization https://github.com/xenova/transformers.js/blob/main/scripts/convert.py