Add support for Lucene int4 SQ

naveentatikonda commented 2 weeks ago

Description

Add support for Lucene SQ 4 bits

Related Issues

Resolves #2252

Check List

[x] New functionality includes testing.
[ ] New functionality has been documented.
[ ] API changes companion pull request created.
[x] Commits are signed per the DCO using --signoff.
[x] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 1 week ago

@naveentatikonda can you link the benchmarks or provide a link where the benchmarks are present for Lucene SQ 4 bit

naveentatikonda commented 1 week ago

@naveentatikonda can you link the benchmarks or provide a link where the benchmarks are present for Lucene SQ 4 bit

@navneet1v Pls find the benchmarking recall results below m	ef_construction	ef_search	confidence interval	Primary	Replica
16	100	100	0 (dynamic)	8	0

Dataset	Bits	spaceType	Recall
glove-200-angular	4	cosine	0.68
glove-200-angular	7	cosine	0.72
cohere-1m	4	Inner Product	0.67
cohere-1m	4	L2	0.87

navneet1v commented 1 week ago

@naveentatikonda can you link the benchmarks or provide a link where the benchmarks are present for Lucene SQ 4 bit

@navneet1v Pls find the benchmarking recall results below

m ef_construction ef_search confidence interval Primary Replica 16 100 100 0 (dynamic) 8 0 Dataset Bits spaceType Recall glove-200-angular 4 cosine 0.68 glove-200-angular 7 cosine 0.72 cohere-1m 4 Inner Product 0.67 cohere-1m 4 L2 0.87

Thanks for sharing the results. Since we are having recalls in order 0.7, do you think we should enable the rescoring OOB with this quantization? and then we should launch the feature? would like to know your thoughts?

naveentatikonda commented 1 week ago

Thanks for sharing the results. Since we are having recalls in order 0.7, do you think we should enable the rescoring OOB with this quantization? and then we should launch the feature? would like to know your thoughts?

Yeah, that's definitely a good idea, we can see better recall by trading off latency. But, I thought that we only want to support rescoring for only on_disk mode and as of today we are only supporting it for Faiss engine. Also, we might not include this (as 8x compression) as part of on_disk because we prefer to use Faiss engine over Lucene.

From UX perspective, you want to add rescoring support to Lucene with SQ irrespective of on_disk ?

navneet1v commented 1 week ago

From UX perspective, you want to add rescoring support to Lucene with SQ irrespective of on_disk ?

This is a good point. I think rescoring and on_disk should be 2 different things, I should be able to do rescoring without mentioning on_disk mode. I feel this is getting tangled more and more as when I think about it. I think we should trigger a discussion around can rescoring be used outside of on_disk mode or it is always tied to on_disk?

@jmazanec15 , @shatejas , @vamshin

jmazanec15 commented 1 week ago

We are able to do rescoring without specifying on_disk. on_disk just sets default rescoring. Issue is we do not support re-scoring for lucene because we use Lucene's query. But, we should onboard support for it with this.

navneet1v commented 1 week ago

We are able to do rescoring without specifying on_disk. on_disk just sets default rescoring. Issue is we do not support re-scoring for lucene because we use Lucene's query. But, we should onboard support for it with this.

if this is case, then I think we should start working on implementing the rescoring feature for Lucene query clause. Given the recall is quite not good for int4.

heemin32 commented 1 week ago

Rescoring make sense when we use full precision vector during rescoring. If we uses quantized vector during rescoring, rescoring won't increase recall much. Therefore, rescoring kind of tied to on_disk in that sense.

navneet1v commented 1 week ago

Rescoring make sense when we use full precision vector during rescoring. If we uses quantized vector during rescoring, rescoring won't increase recall much. Therefore, rescoring kind of tied to on_disk in that sense.

thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.

heemin32 commented 1 week ago

thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.

Then, it is "on_disk" right?

navneet1v commented 1 week ago

thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.

Then, it is "on_disk" right?

why it will be on_disk then? as Jack mentioned earlier on_disk is just a way to setup some defaults.

heemin32 commented 1 week ago

thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.

Then, it is "on_disk" right?

why it will be on_disk then? as Jack mentioned earlier on_disk is just a way to setup some defaults.

It might confuse user experience.I thought SQ is actually throwing away the original vector and only store quantized vector. Whereas, on_disk will store full precision vector but use quantized vector for in memory index.

naveentatikonda commented 1 week ago

why it will be on_disk then? as Jack mentioned earlier on_disk is just a way to setup some defaults.

It might confuse user experience.I thought SQ is actually throwing away the original vector and only store quantized vector. Whereas, on_disk will store full precision vector but use quantized vector for in memory index.

No @heemin32, in Lucene SQ also has full precision vectors stored in a segment file on disk, which we will use to requantize the data if the quantiles changed in that segment

naveentatikonda commented 6 days ago

@navneet1v @shatejas as discussed offline ran some tests by tuning hyper parameters, ef_search(using method_parameters), ef_construction and m. Looking at the results, there isn't much improvement in recall(not close to 0.9) even after bumping up these parameters a lot. So, we must invest some time and add rescoring support to lucene. bits	confidence interval	Primary	Replica
4	0 (dynamic)	8	0

Dataset	ef_construction	ef_search	m	Recall
glove-200-angular	100	100	16	0.68
glove-200-angular	100	256	16	0.72
glove-200-angular	100	512	16	0.74
glove-200-angular	256	512	16	0.76
glove-200-angular	256	512	64	0.78
glove-200-angular	512	512	64	0.78
glove-200-angular	1024	1024	100	0.78
cohere-ip-1m	100	100	16	0.67
cohere-ip-1m	100	256	16	0.67
cohere-ip-1m	100	512	16	0.67
cohere-l2-1m	100	100	16	0.87
cohere-l2-1m	100	256	16	0.88
cohere-l2-1m	100	512	16	0.89

opensearch-project / k-NN