mitre / quaerite

Search relevance evaluation toolkit
Other
73 stars 14 forks source link

Add result set comparison scorers #45

Open tballison opened 5 years ago

tballison commented 5 years ago

As mentioned in #41 , it may be helpful to compare result set overlap whether or not judgments are available.

tballison commented 5 years ago

Would it be sufficient to output a confusion matrix at the experiment level? Experiment A's results have 90% overlap with Experiment B, and Experiment B has 50% overlap with Experiment C...etc.

Or do we also need to have per-query comparisons to allow for drill down?

The first is straightforward, and we have a model for that already. The second, I worry, would become way too much information...what exactly would it look like?

ehaubert commented 5 years ago

The confusion matrix makes the most sense at a reporting level. The drill-down seems like a task for a different kind of tool?