neulab / ExplainaBoard

Interpretable Evaluation for AI Systems
MIT License
361 stars 36 forks source link

New task: IR #98

Open neubig opened 2 years ago

neubig commented 2 years ago

This is not super-high priority, but it'd be nice to be able to analyze IR tasks. Some example benchmarks include

pfliu-nlp commented 2 years ago

Sounds good. A good idea is that we can start with our Dataset Finder dataset and models.