harsha-simhadri / big-ann-benchmarks

Framework for evaluating ANNS algorithms on billion scale datasets.
https://big-ann-benchmarks.com
MIT License
356 stars 118 forks source link

3 new runbooks #300

Closed xuhaike closed 3 months ago

xuhaike commented 3 months ago

Created three new runbooks on msturing-10m, wikipedia-35m, and msmarco-100m datasets. Please see the yaml files and the python scripts to generate them in folder neurips23/streaming

magdalendobson commented 3 months ago

Thanks for contributing these! I have one ask--can you ensure that the random number generation is deterministic by giving a seed using random.seed()? It would be good if the Python scripts always produced exactly the same runbook.

magdalendobson commented 3 months ago

I also noticed that these runbook names aren't incorporated into the options for computing recall in data_export.py. Can you add those also?