Open naveentatikonda opened 2 weeks ago
@naveentatikonda can you link the benchmarks or provide a link where the benchmarks are present for Lucene SQ 4 bit
@naveentatikonda can you link the benchmarks or provide a link where the benchmarks are present for Lucene SQ 4 bit
@navneet1v Pls find the benchmarking recall results below m | ef_construction | ef_search | confidence interval | Primary | Replica |
---|---|---|---|---|---|
16 | 100 | 100 | 0 (dynamic) | 8 | 0 |
Dataset | Bits | spaceType | Recall |
---|---|---|---|
glove-200-angular | 4 | cosine | 0.68 |
glove-200-angular | 7 | cosine | 0.72 |
cohere-1m | 4 | Inner Product | 0.67 |
cohere-1m | 4 | L2 | 0.87 |
@naveentatikonda can you link the benchmarks or provide a link where the benchmarks are present for Lucene SQ 4 bit
@navneet1v Pls find the benchmarking recall results below
m ef_construction ef_search confidence interval Primary Replica 16 100 100 0 (dynamic) 8 0 Dataset Bits spaceType Recall glove-200-angular 4 cosine 0.68 glove-200-angular 7 cosine 0.72 cohere-1m 4 Inner Product 0.67 cohere-1m 4 L2 0.87
Thanks for sharing the results. Since we are having recalls in order 0.7, do you think we should enable the rescoring OOB with this quantization? and then we should launch the feature? would like to know your thoughts?
Thanks for sharing the results. Since we are having recalls in order 0.7, do you think we should enable the rescoring OOB with this quantization? and then we should launch the feature? would like to know your thoughts?
Yeah, that's definitely a good idea, we can see better recall by trading off latency. But, I thought that we only want to support rescoring for only on_disk mode and as of today we are only supporting it for Faiss engine. Also, we might not include this (as 8x compression) as part of on_disk because we prefer to use Faiss engine over Lucene.
From UX perspective, you want to add rescoring support to Lucene with SQ irrespective of on_disk ?
From UX perspective, you want to add rescoring support to Lucene with SQ irrespective of on_disk ?
This is a good point. I think rescoring and on_disk should be 2 different things, I should be able to do rescoring without mentioning on_disk mode. I feel this is getting tangled more and more as when I think about it. I think we should trigger a discussion around can rescoring be used outside of on_disk mode or it is always tied to on_disk?
@jmazanec15 , @shatejas , @vamshin
We are able to do rescoring without specifying on_disk. on_disk just sets default rescoring. Issue is we do not support re-scoring for lucene because we use Lucene's query. But, we should onboard support for it with this.
We are able to do rescoring without specifying on_disk. on_disk just sets default rescoring. Issue is we do not support re-scoring for lucene because we use Lucene's query. But, we should onboard support for it with this.
if this is case, then I think we should start working on implementing the rescoring feature for Lucene query clause. Given the recall is quite not good for int4.
Rescoring make sense when we use full precision vector during rescoring. If we uses quantized vector during rescoring, rescoring won't increase recall much. Therefore, rescoring kind of tied to on_disk in that sense.
Rescoring make sense when we use full precision vector during rescoring. If we uses quantized vector during rescoring, rescoring won't increase recall much. Therefore, rescoring kind of tied to on_disk in that sense.
thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.
thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.
Then, it is "on_disk" right?
thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.
Then, it is "on_disk" right?
why it will be on_disk then? as Jack mentioned earlier on_disk is just a way to setup some defaults.
thats correct @heemin32 . Here when we are talking about rescoring we are talking about rescoring via full precision vectors only.
Then, it is "on_disk" right?
why it will be on_disk then? as Jack mentioned earlier on_disk is just a way to setup some defaults.
It might confuse user experience.I thought SQ is actually throwing away the original vector and only store quantized vector. Whereas, on_disk will store full precision vector but use quantized vector for in memory index.
why it will be on_disk then? as Jack mentioned earlier on_disk is just a way to setup some defaults.
It might confuse user experience.I thought SQ is actually throwing away the original vector and only store quantized vector. Whereas, on_disk will store full precision vector but use quantized vector for in memory index.
No @heemin32, in Lucene SQ also has full precision vectors stored in a segment file on disk, which we will use to requantize the data if the quantiles changed in that segment
@navneet1v @shatejas as discussed offline ran some tests by tuning hyper parameters, ef_search(using method_parameters), ef_construction and m. Looking at the results, there isn't much improvement in recall(not close to 0.9) even after bumping up these parameters a lot. So, we must invest some time and add rescoring support to lucene. bits | confidence interval | Primary | Replica |
---|---|---|---|
4 | 0 (dynamic) | 8 | 0 |
Dataset | ef_construction | ef_search | m | Recall |
---|---|---|---|---|
glove-200-angular | 100 | 100 | 16 | 0.68 |
glove-200-angular | 100 | 256 | 16 | 0.72 |
glove-200-angular | 100 | 512 | 16 | 0.74 |
glove-200-angular | 256 | 512 | 16 | 0.76 |
glove-200-angular | 256 | 512 | 64 | 0.78 |
glove-200-angular | 512 | 512 | 64 | 0.78 |
glove-200-angular | 1024 | 1024 | 100 | 0.78 |
cohere-ip-1m | 100 | 100 | 16 | 0.67 |
cohere-ip-1m | 100 | 256 | 16 | 0.67 |
cohere-ip-1m | 100 | 512 | 16 | 0.67 |
cohere-l2-1m | 100 | 100 | 16 | 0.87 |
cohere-l2-1m | 100 | 256 | 16 | 0.88 |
cohere-l2-1m | 100 | 512 | 16 | 0.89 |
Description
Add support for Lucene SQ 4 bits
Related Issues
Resolves #2252
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.