Closed SunnyMarkLiu closed 5 years ago
Hi, thanks for your compliments :).
The F1 scores vary by about 2% (we could probably do better with some more hyperparameter tuning) because of several factors such as random initialization and negative sampling. That's why we report the average of 5 runs with random seeds in the paper. Also, we retrained the model on the combined train and dev set after hyperparameter tuning ('datasets/conll04/conll04_train_dev.json').
Could you please report the average of 5 runs with random seeds and trained on train+dev?
I have retrained on train+dev with the same random seed I used before, and the test result is much better now, and close to or even better than the paper reported.
--- Entities (NER) ---
type precision recall f1-score support
Peop 92.79 96.26 94.50 321
Loc 91.36 91.57 91.46 427
Other 80.00 72.18 75.89 133
Org 80.09 85.35 82.64 198
micro 88.37 89.43 88.90 1079
macro 86.06 86.34 86.12 1079
--- Relations ---
Without NER
type precision recall f1-score support
LocIn 78.38 61.70 69.05 94
Work 66.67 63.16 64.86 76
OrgBI 65.45 68.57 66.98 105
Live 71.93 82.00 76.64 100
Kill 87.23 87.23 87.23 47
micro 72.18 71.33 71.75 422
macro 73.93 72.53 72.95 422
With NER
type precision recall f1-score support
LocIn 78.38 61.70 69.05 94
Work 66.67 63.16 64.86 76
OrgBI 65.45 68.57 66.98 105
Live 71.93 82.00 76.64 100
Kill 87.23 87.23 87.23 47
micro 72.18 71.33 71.75 422
macro 73.93 72.53 72.95 422
And I think, the average of 5 runs with random seeds and trained on train+dev will meet the paper's result. Thanks again!
I am tring to reproduce this work. I have some doubts about it. What is the role of seeds? The performance on CoNLL04 in the paper is trained on train+dev dataset, is it?
I'm not sure what you mean with "role of seeds". By using a random seed, we ensure that weights are initialized differently in each run (also things like random sampling depend on the seed). Yes, we train the final model on the train+dev dataset. This is a common thing to do after hyperparameter tuning.
I'm not sure what you mean with "role of seeds". By using a random seed, we ensure that weights are initialized differently in each run (also things like random sampling depend on the seed). Yes, we train the final model on the train+dev dataset. This is a common thing to do after hyperparameter tuning.
Thanks for your reply. I just don't understand the effect of the seeds for the method performance. According your reply, I think it is no effect but just initialize weight. I was always train the model only on the train dataset. And if train the model on the train+dev dataset, I think that maybe leak the data in the evaluation because the performance is on the dev in the code. I have try to evaluate the provided model on the test set. The performance is better than the paper.
PS: It is a very excellent work and the code is very very good. I have study that for weeks!
Yes, you should evaluate the provided model on the test set. However, the provided model is the best out of 5 runs, whereas we report the average of 5 runs in our paper (...and due to random weight initialization and sampling the performance varies between runs). That's why you get a better performance compared to the results we reported in our paper.
Thanks :)!
Yes, you should evaluate the provided model on the test set. However, the provided model is the best out of 5 runs, whereas we report the average of 5 runs in our paper (...and due to random weight initialization and sampling the performance varies between runs). That's why you get a better performance compared to the results we reported in our paper.
Thanks :)!
I understand. Thanks a lot.
First of all, thanks for sharing this cleaned and object-oriented code! I have learned a lot from this repo. I even want to say
Wow, you can really code!
^_^I have training the model on CoNLL04 dataset with the default configuration, according to the README, and the test results as follows:
The test result is worse than the original paper, especially for
macro-average
metrics.Is it possible that the random
seed
is different? I just setseed=42
inexample_train.conf
Thanks!