harsha-simhadri / big-ann-benchmarks

Framework for evaluating ANNS algorithms on billion scale datasets.
https://big-ann-benchmarks.com
MIT License
313 stars 103 forks source link

Zilliz streaming solution #268

Open hhy3 opened 5 months ago

hhy3 commented 5 months ago

expected results:

zilliz,"zilliz(('R32_L110', {'Ls': 550, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.5598933696746826,2033212.0,0.0,0.0,streaming,0.9976500000000108

Our solution is based on vamana graph and SQ8 quantization. The main search process is under high-performance 8-bit integer computation. To restore the recall rate, floating point refine is used. To satisfy the 8G memory limitation, we use fp16 instead of fp32 to refine the results.

ingberam commented 5 months ago

Code installs without issues.

results on Azure Standard D8lds v5 (8 vcpus, 16 GiB memory) (see comments below):

zilliz,"zilliz(('R32_L110', {'Ls': 550, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.49029040336608887,2033704.0,0.0,0.0,streaming,0.0
zilliz,"zilliz(('R32_L110', {'Ls': 500, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.49029040336608887,2033704.0,0.0,0.0,streaming,0.016019999999999916
zilliz,"zilliz(('R32_L110', {'Ls': 450, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.49029040336608887,2033704.0,0.0,0.0,streaming,0.022890000000000202
zilliz,"zilliz(('R32_L110', {'Ls': 400, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.49029040336608887,2033704.0,0.0,0.0,streaming,0.9952400000000019

Note that the first three runs had issues and did not get good recall. The log file is too large to put here, here is a sample from the issues that happened (this is from the Ls=450 run):

2024-01-26 22:55:45,677 - annb.88d594fbfe37 - INFO - #active pts 10231576 #unprocessed deletes 0
2024-01-26 22:55:48,489 - annb.88d594fbfe37 - INFO - Step 365 took 2.8774523735046387s.
2024-01-26 22:55:50,124 - annb.88d594fbfe37 - INFO - Step 366 took 1.6343872547149658s.
2024-01-26 22:55:50,151 - annb.88d594fbfe37 - INFO - #active pts 10263125 #unprocessed deletes 0
2024-01-26 22:55:51,684 - annb.88d594fbfe37 - INFO - Step 367 took 1.5597820281982422s.
2024-01-26 22:55:53,322 - annb.88d594fbfe37 - INFO - Step 368 took 1.636810302734375s.
2024-01-26 22:55:53,330 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,330 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,331 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,331 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,331 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,331 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,331 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,332 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,332 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,332 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,332 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,332 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,332 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,333 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,333 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,333 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,333 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,333 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,333 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,334 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,334 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,334 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,334 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,334 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,335 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,335 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,335 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,335 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,335 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,335 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,336 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,336 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,336 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,336 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,336 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,337 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,337 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
2024-01-26 22:55:53,337 - annb.88d594fbfe37 - INFO - Failed to insert, slots.size() = 0, active pts = 10292043
hhy3 commented 5 months ago

@ingberam Ok. I'll try to fix it.

hhy3 commented 5 months ago

@ingberam Code updated, can you run it again?

ingberam commented 5 months ago

ok, I had to run the 4 parameter sets separately. But the results look good (even a tiny bit better than your resuts):

zilliz,"zilliz(('R32_L110', {'Ls': 400, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.560248613357544,1998432.0,0.0,0.0,streaming,0.9950700000000096
zilliz,"zilliz(('R32_L110', {'Ls': 450, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.4881885051727295,1996952.0,0.0,0.0,streaming,0.9964800000000104
zilliz,"zilliz(('R32_L110', {'Ls': 500, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.48344945907592773,1996968.0,0.0,0.0,streaming,0.9971399999999904
zilliz,"zilliz(('R32_L110', {'Ls': 550, 'T': 8}))",msturing-30M-clustered(final_runbook.yaml),10,,0.0,0.4869539737701416,1996116.0,0.0,0.0,streaming,0.9976799999999912

please ack, and fix conflicts so that we can merge