mia-workshop / MIA-Shared-Task-2022

An official repository for MIA 2022 (NAACL 2022 Workshop) Shared Task on Cross-lingual Open-Retrieval Question Answering.
MIT License
31 stars 4 forks source link

Possible Error in Baseline Table: Confused BLEU and EM #3

Closed tuzhucheng closed 2 years ago

tuzhucheng commented 2 years ago

Hi, thanks for organizing this workshop!

I believe there is an error in the baseline Final results F1 | EM table for XOR QA for baseline 1 (multilingual DPR + multilingual seq2seq (CORA without iterative training)).

https://github.com/mia-workshop/MIA-Shared-Task-2022/blob/02ed407f6b0891373446d6844e50c66a54a32a46/README.md?plain=1#L239-L250

The Macro-Average for (1) EM should be 29.1 and not 26.8. 26.8 is the BLEU.

I checked the numbers using:

python mia-organizer/eval_scripts/eval_xor_full.py \
--data_file mia-organizer/data/eval/mia_2022_dev_xorqa.jsonl
--pred_file data/baseline1_mdpr_mgen/baseline2_xor_dev_results.json

yielding the results:

avg f1: 38.87920157968395
avg em: 29.10566839259774
avg bleu: 26.76662074445292

and also the average of the rows for that column is 29.1.

AkariAsai commented 2 years ago

Oh, that's a great catch! You're right. I'll fix the README, thanks!