harsha-simhadri / big-ann-benchmarks

Framework for evaluating ANNS algorithms on billion scale datasets.
https://big-ann-benchmarks.com
MIT License
313 stars 103 forks source link

Add OpenAI Embedding 1M point dataset #247

Closed ekzhu closed 7 months ago

ekzhu commented 7 months ago

Add the OpenAI embedding dataset from HuggingFace: https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M. Queries are random sampled from the dataset itself.