qdrant / vector-db-benchmark

Framework for benchmarking vector search engines
https://qdrant.tech/benchmarks/
Apache License 2.0
270 stars 77 forks source link

Support pulling embedding from any Huggingface dataset #115

Open KShivendu opened 6 months ago

KShivendu commented 6 months ago

Would be nice if we could support pulling embedding from any Huggingface dataset. This would make the project even more useful for external users :)

The spec for this could be like this:

{
    "name": "SciPhi/AgentSearch-V1",
    "vector_size": 100,
    "distance": "cosine",
    "type": "huggingface",
    "path": "glove-100-angular/glove-100-angular.hdf5",
    "link": "https://huggingface.co/datasets/SciPhi/AgentSearch-V1",
    "schema": {
      "vector_field": "openai",
      "payload": {
        "url": "text"
      }
    }
}

Needs some discussion before implementing