Since OpenSearch 2.17 we have support for Lucene Inbuilt Scalar Quantizer which accepts fp32 vectors as input and dynamically quantizes the data into int7 ranging from [0 to 127] providing 4x compression. Adding support for 4 bits to the Lucene SQ provides 8x compression which helps to quantize fp32 vectors into int4 ranging from [0 to 15], which helps to further reduce the memory requirements by trading off recall.
Description
Since OpenSearch 2.17 we have support for Lucene Inbuilt Scalar Quantizer which accepts fp32 vectors as input and dynamically quantizes the data into int7 ranging from [0 to 127] providing 4x compression. Adding support for 4 bits to the Lucene SQ provides 8x compression which helps to quantize fp32 vectors into int4 ranging from [0 to 15], which helps to further reduce the memory requirements by trading off recall.