Closed howardchenhd closed 3 years ago
Close this issue as you used the wrong squad_metrics.py
according to your email.
Hi, would you mind sharing the hyperparameters used for the XLMR translate train multi-task, if you still have them? Sorry, I know this is an old repository, but if you still have that email I'd be interested in knowing. Thanks!
Hi, thanks for the code! I am trying to replicate MLQA scores (XLMR translate train multi-task) using this repo's code, but the score on zh language is wrong while the other is right.
'test_en': {'exact_match': 70.20707506471096, 'f1': 83.69701042191264}, 'test_es': {'exact_match': 57.35770036169808, 'f1': 75.17824917711975}, 'test_de': {'exact_match': 54.903697144122205, 'f1': 70.18311703583281}, 'test_ar': {'exact_match': 47.403936269915654, 'f1': 67.90403139343402}, 'test_hi': {'exact_match': 53.395689304595365, 'f1': 71.87824404813004}, 'test_vi': {'exact_match': 53.7761601455869, 'f1': 74.26649611206571}, 'test_zh': {'exact_match': 5.606385049639868, 'f1': 18.790755684164417},
Unlike Xtreme repo, I found you already change
if 'zh' in output_prediction_file: final_text = tok_text
in squad_metrics.py.