neulab / ExplainaBoard

Interpretable Evaluation for AI Systems
MIT License
359 stars 36 forks source link

Introduce meta eval metric for general NLG tasks #526

Closed pfliu-nlp closed 1 year ago

pfliu-nlp commented 1 year ago

Previously, we have meta-evaluation metrics (and tasks) specifically defined for WMT, which are too dataset dependent and not generalized enough for other NLG tasks, such as summarization and data-to-text. This PR aims to introduce meta-evaluation metrics for generation tasks such as summarization and data-to-text.

pfliu-nlp commented 1 year ago

@odashi, I should fix most of the comments. One remaining one could be fixed after we have a consensus (in PR https://github.com/neulab/ExplainaBoard/pull/527) about how to store the number of samples.

neubig commented 1 year ago

528 is merged, so I think this can be revisited