neulab / ExplainaBoard

Interpretable Evaluation for AI Systems

MIT License

360 stars 36 forks source link

prototype interpretation module - for combo analysis #567

Closed pfliu-nlp closed 1 year ago

pfliu-nlp commented 1 year ago

Overview

This PR makes a prototype of an interpretation module, which mainly aims to generate observations and suggestions for each ComboAnalysisResult.

Details

For example, given the bucket analysis result (ComboAnalysisResult) of combo(span_true_label,span_pred_label), this PR could generate:

Observations

[InterpretationObservation(
keywords='misprediction_description', 
content='The system tend mispredict: the label `O` as `MISC` (percentage of total errors: 0.145985401459854),
 the label `ORG` as `O` (percentage of total errors: 0.12700729927007298), the label `MISC` as `O` (percentage of total errors: 0.10510948905109489)')]

Suggestions

[InterpretationSuggestion(
keywords='misprediction_description',
 content='These samples, which are frequently mispredicted by the model, need to be prioritized for solutions.')]

References

Blocked by

https://github.com/neulab/ExplainaBoard/pull/566