Test of analysis on open-domain QA

neulab / ExplainaBoard

Interpretable Evaluation for AI Systems

MIT License

361 stars 36 forks source link

There is a nice PrimeQA framework developed by IBM, and it seems that it should make it relatively easy to generate open-domain QA results from multiple SOTA models.

It would be nice if we could look into these results, analyze them with explainaboard, and see if there are any ways we could use them to improve our analysis of Open Domain QA models.

In order to do this, we'd need to:

[x] Take a look at the machine reading comprehension tutorial for primeqa
[x] Decide which datasets we want to focus on
[ ] Generate multiple system outputs for these datasets
[ ] Analyze them in ExplainaBoard and see if we get any interesting insights
[ ] Further add features to the analysis

neulab / ExplainaBoard

Test of analysis on open-domain QA #495