Closed AndrewHYu closed 3 months ago
Thanks for your contribution. I am evaluating it now and will get back to you on how it goes!
I ran with the downloaded index and got the following results:
hanns,"hanns,tree=27/40000,reorder=111",text2image-10M,10,53085.03723024023,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8774520000000001 hanns,"hanns,tree=27/40000,reorder=130",text2image-10M,10,51222.16584203003,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.882422 hanns,"hanns,tree=32/40000,reorder=140",text2image-10M,10,46858.49102240073,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8944110000000001 hanns,"hanns,tree=32/40000,reorder=150",text2image-10M,10,46771.317990241405,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.896185 hanns,"hanns,tree=34/40000,reorder=150",text2image-10M,10,45381.62378698972,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.899572 hanns,"hanns,tree=34/40000,reorder=155",text2image-10M,10,45685.10712457384,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.900311 hanns,"hanns,tree=36/40000,reorder=150",text2image-10M,10,44630.44910364101,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9026080000000001 hanns,"hanns,tree=37/40000,reorder=145",text2image-10M,10,44957.96616927795,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9031560000000001 hanns,"hanns,tree=38/40000,reorder=140",text2image-10M,10,44787.13982548163,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.903562 hanns,"hanns,tree=42/40000,reorder=160",text2image-10M,10,41713.34961169815,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.911723 .
These seem to agree with your posted figure. Now running without the downloaded index. By the way, your index building code seems to download a file called config.pb
even when the download is disabled. Inspecting looks like it just contains parameters, but can you just confirm that it doesn't contain any pre-computed index information?
I ran with the downloaded index and got the following results:
hanns,"hanns,tree=27/40000,reorder=111",text2image-10M,10,53085.03723024023,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8774520000000001 hanns,"hanns,tree=27/40000,reorder=130",text2image-10M,10,51222.16584203003,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.882422 hanns,"hanns,tree=32/40000,reorder=140",text2image-10M,10,46858.49102240073,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.8944110000000001 hanns,"hanns,tree=32/40000,reorder=150",text2image-10M,10,46771.317990241405,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.896185 hanns,"hanns,tree=34/40000,reorder=150",text2image-10M,10,45381.62378698972,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.899572 hanns,"hanns,tree=34/40000,reorder=155",text2image-10M,10,45685.10712457384,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.900311 hanns,"hanns,tree=36/40000,reorder=150",text2image-10M,10,44630.44910364101,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9026080000000001 hanns,"hanns,tree=37/40000,reorder=145",text2image-10M,10,44957.96616927795,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.9031560000000001 hanns,"hanns,tree=38/40000,reorder=140",text2image-10M,10,44787.13982548163,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.903562 hanns,"hanns,tree=42/40000,reorder=160",text2image-10M,10,41713.34961169815,0.0,1.2159347534179688e-05,5185752.0,0.0,0.0,ood,0.911723 .
These seem to agree with your posted figure. Now running without the downloaded index. By the way, your index building code seems to download a file called
config.pb
even when the download is disabled. Inspecting looks like it just contains parameters, but can you just confirm that it doesn't contain any pre-computed index information?
yes,it's parameters for search
I was able to build the index from scratch and confirm that it builds within the time and memory limits. I got the following results:
2: hanns,tree=34/40000,reorder=150 0.899 46754.467
4: hanns,tree=42/40000,reorder=160 0.911 42331.832
6: hanns,tree=32/40000,reorder=140 0.894 48426.372
9: hanns,tree=36/40000,reorder=150 0.902 45949.345
12: hanns,tree=32/40000,reorder=150 0.895 47507.687
14: hanns,tree=27/40000,reorder=111 0.877 54890.437
15: hanns,tree=27/40000,reorder=130 0.882 53006.511
16: hanns,tree=38/40000,reorder=140 0.903 45162.090
20: hanns,tree=34/40000,reorder=155 0.899 46443.915
21: hanns,tree=37/40000,reorder=145 0.902 45388.283
These agree with the results you shared, and that I found with the pre-computed index. I will approve the merge and speak with the other admins about updating our official results. Great entry!
Hi @AndrewHYu
Thanks for submitting.
I wonder if you can clarify the relationship between your submission and ScaNN? It looks like your submission loads a ScaNN index: https://github.com/harsha-simhadri/big-ann-benchmarks/blob/main/neurips23/ood/hanns/hanns.py#L23-L46
The config.pb file is also identical to that of the ScaNN submission:
diff <(curl https://hanns.obs.ap-southeast-1.myhuaweicloud.com/v2/config.pb) <(curl https://storage.googleapis.com/scann/big-ann-2023/ood/scann_config.pb)
@magdalendobson for FYI.
Hi @arron2003
Thanks for your reminder and contributions.
We used the ScaNN clustering method, and we found that there are many excellent designs that can improve performance and accuracy. Then some configuration items are reused, so the config.pb
file is directly used. We will update the readme for details.
@AndrewHYu Could you please share your name, affiliation and any collaborators on this code?
Our OOD track solution consists of a vamana index, a mutil-scale spatial clustering index, and a layout-optimized quantization acceleration index. The entire retrieval process is from coarse to fine. First, the vamana index is used to quick find the nearst clusters. Then, within these clusters, the quantization-accelerated index is uesed for fast distance comparisons to identify the coarsely ranked candidates. Finally, SIMD instructions are used to re-rank these candidates, and the final results are returned.
https://github.com/AndrewHYu/Hanns