Create a mockup and send to Haritz for feedback (available via discord during vacations).
User selects a dataset => show all evaluation-results on that dataset with default-metric. User can then also select a different metric (but default is always the default-metric of the dataset).
Create a mockup and send to Haritz for feedback (available via discord during vacations). User selects a dataset => show all evaluation-results on that dataset with default-metric. User can then also select a different metric (but default is always the default-metric of the dataset).
Examples: https://rajpurkar.github.io/SQuAD-explorer/ https://hotpotqa.github.io/index.html https://huggingface.co/spaces/autoevaluate/leaderboards