qdrant / quaterion

Blazing fast framework for fine-tuning similarity learning models
https://quaterion.qdrant.tech/
Apache License 2.0
640 stars 45 forks source link

Improve evaluation procedure for extensive results #190

Open monatis opened 1 year ago

monatis commented 1 year ago

Problem

In the current implementation we use samplers to calculate evaluation metrics on a small subset of the dataset. This can give slightly different scores due to the random state in sampling. It's always possible to seed RNGs for reproduceable results, but there might be cases where we are extremely lucky or extremely unlucky based on the chosen seed. It's still fair to compare different checkpoints with seeded evaluators, but we cannot be sure whether we overestimate or underestimate the performance of all the checkpoints.

Possible solution

  1. Add an option to enable multiple passes over the data and report the mean and STD of all passes, or
  2. Accept an optional QdrantClient and if it is None use Qdrant as the backend to store embeddings and retrieve from.
parthkl021 commented 10 months ago

@generall is this issue solved ? If not can I work on it