kentonl / e2e-coref

End-to-end Neural Coreference Resolution
Apache License 2.0
524 stars 173 forks source link

How about the performance on ontonotes v5.0? #45

Closed Ramlinbird closed 4 years ago

Ramlinbird commented 5 years ago

Have you ever run experiments on ontonotes v5.0 dataset? I tried it without changing any training configuration except switching the data from v4.0 to v5.0 and setting lm_path to None. But the best average F1 score on development dataset is only 61 after 200k steps and it reached a plateau. Hope for your reply, thanks a lot.

Ramlinbird commented 5 years ago

I just realized that there is only onll-2012-development.v4.tar.gz file which is used to generate conll file, while the origin ontonotes has been updated to v5.0, which is really confusing. And my experiments was operated on connll files shared by others, there may be some difference. I just got the origin ontonotes data, and I will try it again. Sorry to disturb.

amttar commented 4 years ago

@Ramlinbird have you reached any results using OntoNotes v5.0 that you can share?

tpatzelt commented 4 years ago

I get these results on the ontonoes v5:

version: 8.01 /project/error_analysis/e2e/conll-2012/scorer/v8.01/lib/CorScorer.pm
====== TOTALS =======
Identification of Mentions: Recall: (16651 / 19764) 84.24%      Precision: (16651 / 19351) 86.04%       F1: 85.13%
--------------------------------------------------------------------------
Coreference: Recall: (12114 / 15232) 79.52%     Precision: (12114 / 14882) 81.4%        F1: 80.45%
--------------------------------------------------------------------------
Official result for bcub
version: 8.01 /project/error_analysis/e2e/conll-2012/scorer/v8.01/lib/CorScorer.pm
====== TOTALS =======
Identification of Mentions: Recall: (16651 / 19764) 84.24%      Precision: (16651 / 19351) 86.04%       F1: 85.13%
--------------------------------------------------------------------------
Coreference: Recall: (13714.9966220446 / 19764) 69.39%  Precision: (13964.2247782205 / 19351) 72.16%    F1: 70.75%
--------------------------------------------------------------------------
Official result for ceafe
version: 8.01 /project/error_analysis/e2e/conll-2012/scorer/v8.01/lib/CorScorer.pm
====== TOTALS =======
Identification of Mentions: Recall: (16651 / 19764) 84.24%      Precision: (16651 / 19351) 86.04%       F1: 85.13%
--------------------------------------------------------------------------
Coreference: Recall: (3045.98370534764 / 4532) 67.21%   Precision: (3045.98370534764 / 4469) 68.15%     F1: 67.68%
--------------------------------------------------------------------------
Average F1 (conll): 72.96%
Average F1 (py): 72.96%
Average precision (py): 73.91%
Average recall (py): 72.04%