lightonai / pylate

Late Interaction Models Training & Retrieval
https://lightonai.github.io/pylate/
MIT License
158 stars 7 forks source link

Use better evaluation metrics #20

Closed NohTow closed 2 months ago

NohTow commented 4 months ago

Right now, we compute the Hit@K alongside the accuracy (% of eval queries that are closer to their positive document than the negative from the triplet). Hit@k requires computing the similarity scores for all pairs in the eval dataset, which is already very expensive for dense models given large eval set, but is prohibitively expensive using maxsim.

Better and tractable metrics to monitor the training would be better.