Closed rishibommasani closed 7 months ago
Now that ranking instances include all the candidates, we don't need to do anything special for showing the instances. However, we need to branch on the type of adaptation method == BINARY_RANKING and show the predictions (which are on individual references) in a meaningful way.
Sounds great, @percyliang, are you referring to the model ranking as the prediction? If so, we don't really record this information anywhere. Our options are:
I think we need to store something in the per_instance_metrics.json
(related to #905 for even just normal classification).
@percyliang, got it, this can be slightly more trickier for ranking as we need a rank for each reference and the number of references isn't fixed.
I see a potential solution that involves string parsing following your example in #905: We can have a stat named f"rank_{reference_index}"
for each reference, and set its value to the corresponding rank. How does this sound?
Yes, that will do for now. Eventually, I think we might want to have a more principled way of encoding this information. I'd call it "ref{reference_index}_rank"
to be more descriptive.
Added in #1013
It seems to me that we'll need to re-run MS MARCO, and there's no way to generate the new per-instance stats from the existing information. Is this correct?
Never mind, I see that ref{reference_index}_rank
already exists.
What reminds here? This is how things look like currently. Do we only want to display top ranked options only?
Also, I think we should remove ref{reference_index}_rank
from the global metrics below:
Thanks Yifan! It might be good to indicate the model ordering in a bracket as well.
How can we remove the global metrics? We compute these metrics at an individual instance, but they get averaged and result in the global metrics.
The rank are already in brackets (see "rank=492" in the last entry). Not sure why some entries have them and some don't.
For the global metrics, the easiest thing to do is probably to add a filter list on the frontend to filter out these metrics from the table.
Closing because MS Marco is deprecated-ish i.e. it has been removed from Lite.
These may be more general things (at least the first one), but are especially relevant for this scenario.