neulab / ExplainaBoard

Interpretable Evaluation for AI Systems
MIT License
360 stars 36 forks source link

prototype interpretation module - for multi-bucket analysis #568

Closed pfliu-nlp closed 1 year ago

pfliu-nlp commented 1 year ago

Overview

This PR makes a prototype of an interpretation module, which mainly aims to generate observations and suggestions for multiple-bucket analysis.

Details

Observations

{'F1': [
InterpretationObservation(
keywords='salient_feature_description',
 content="the model's performance will be improved as the feature value of `span_length` decreases, the model's performance will be improved as the feature value of `span_econ` increases. the model's performance will be improved as the feature value of `span_efre` increases."),
InterpretationObservation(
keywords='max_performance_gap_feature',
 content='On the `span_efre` feature, the bucket performance difference reaches the maximum of 0.0880777480362519')
]}

Suggestions

{'F1': [InterpretationSuggestion(keywords='salient_feature_description', content='The performance of the system is highly affected by these features. Consider augment the training samples to improve the model performance.')]}

References

Blocked by