neulab / ExplainaBoard

Interpretable Evaluation for AI Systems
MIT License
359 stars 36 forks source link

list[MetricSomething] to dict[str, MetricSomething] #515

Closed odashi closed 1 year ago

odashi commented 1 year ago

This change attempts to change all list[Metric***] structures into dict[str, Metric***], whose key is the metric name.

odashi commented 1 year ago

@neubig @pfliu-nlp The change looks almost working, but there are several errors due to value mismatch:

======================================================================
FAIL: test_extractive_qa_en (integration_tests.extractive_qa_test.ExtractiveQATest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "~/ExplainaBoard/integration_tests/extractive_qa_test.py", line 48, in test_extractive_qa_en
    self.assertAlmostEqual(overall["ExactMatch"].value, 0.6974789915966386, 2)
AssertionError: 0.6571428571428571 != 0.6974789915966386 within 2 places (0.04033613445378148 difference)

======================================================================
FAIL: test_extractive_qa_zh (integration_tests.extractive_qa_test.ExtractiveQATest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "~/ExplainaBoard/integration_tests/extractive_qa_test.py", line 79, in test_extractive_qa_zh
    self.assertAlmostEqual(overall["F1"].value, 0.7559651817716333, 2)
AssertionError: 0.6857142857142857 != 0.7559651817716333 within 2 places (0.07025089605734758 difference)

======================================================================
FAIL: test_qa_metrics (integration_tests.metric_test.MetricTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "~/ExplainaBoard/integration_tests/metric_test.py", line 147, in test_qa_metrics
    self.assertAlmostEqual(overall["ExactMatch"].value, 0.6974789915966386, 2)
AssertionError: 0.6571428571428571 != 0.6974789915966386 within 2 places (0.04033613445378148 difference)

I didn't understand the source of these errors. It would be nice if you can take a look at them.

odashi commented 1 year ago

I found some issues, fixing it