neulab / ExplainaBoard

Interpretable Evaluation for AI Systems
MIT License
359 stars 36 forks source link

Strict shape checking of `aggregate_stats` and `calc_metric_from_aggregate` #499

Closed odashi closed 1 year ago

odashi commented 1 year ago

This change introduces strict shape checking of return values of Metric.aggregate_stats and Metric.calc_metric_from_aggregate. These functions are now defined as a wrapper of inner functions (that may be overridden), and checks if the return values have expected shape.

May fix #497.

odashi commented 1 year ago

@neubig This change generates a bunch of errors due to unclear ndarray operations in inherited classes. I couldn't fully realize the correct fix of each implementation. Could you take a look at and fix them if possible?

neubig commented 1 year ago

Thanks @odashi ! I fixed most.

@pfliu-nlp : it seems that the remaining ones are APE and NLG meta-evaluation, where you're more familiar than I am. Would you be able to take a look? My commit here might be a good reference: https://github.com/neulab/ExplainaBoard/pull/499/commits/de3dac48b5cffcadf99d1980af4cddad44bc22d0

pfliu-nlp commented 1 year ago

Sure. I'm working on this.

odashi commented 1 year ago

According to the internal discussion, we will introduce use_customized_aggregate() flag function that informs the assertion to skip to check the size of the last dimension.

neubig commented 1 year ago

I'll take a quick look at this.

neubig commented 1 year ago

@odashi and @pfliu-nlp : tests seem to be passing. you could take a look at my commits and see if you have comments.