neulab / ExplainaBoard

Interpretable Evaluation for AI Systems

MIT License

360 stars 36 forks source link

add tests for meval to replicate paper results #605

Open pfliu-nlp opened 1 year ago

pfliu-nlp commented 1 year ago

Overview

This PR adds tests to verify whether our implemented meta-evaluation processor is able to replicate reported results from existing published papers.

Relevant issue: https://github.com/inspired-co/taskboard/issues/180

Details

Collect system outputs from this repo of two metrics (rouge1 and bartscore)
Using Explainaboard to process these outputs and compare the results with the ones reported from the above repo.

References

Paper: BARTSCORE: Evaluating Generated Text as Text Generation
Code: https://github.com/neulab/BARTScore