Closed luckysheep861 closed 5 years ago
What results did you get? I would suggest deleting the cache files and reruning everything from scratch (make sure you follow the instructions closely). I ran through the process once and found the results reproducible.
hi @kimiyoung.
I also ran the entire codebase following the instructions. This was from a clean clone and building of dataset.
I got only a best dev F1 of 56.462817452313814.
I ran this a couple of times after that and it seems like the score is about 56+. EM is about 42.3+.
Any ideas on what might be the cause?
Thanks!
I got slightly better results, best_dev_F1 56.881756072546665. I too did it from scratch
I believe it is a matter of variance. AFAIK, there could be three factors that led to this: 1) The current implementation of our model might be of high variance. 2) According to my previous experiments, the results vary for different types of GPUs even with the same random seed. I used an old Titan X to get these results. 3) I removed 100 training examples (of low quality) in v1.1 from the training set. This would result in some difference such as data batching which is controlled by the random seed.
I would suggest trying different random seeds to study the effects of model variance. Some random seeds might work better.
@kimiyoung Thanks for your reply!
Actually I tried both versions (with and without 100). Im guessing maybe it's an issue with system or dependencies. I'll try different seeds.
I have one question though, in your early experiments did you try different optimizers or just defaulted to SGD right from the start?
Thanks!
@vanzytay I did not try other optimizers.
I got a even worse result... best_dev_F1 56.075717925566316, EM is about 42. (I ran the code on 1080 Ti.)
1080Ti best_dev_F1 57.83286201117724
@kimiyoung Thanks for your work. Sure, will try to use other random seeds.
P.S. following are the results from the default run - GPU: Titan Xp , Random Seed: default , Setting: distractor Training (end): best_dev_F1 56.841881121141064
Evaluation: {'sp_em': 0.1950033760972316, 'joint_recall': 0.3910371172630571, 'f1': 0.5661927280885037, 'recall': 0.5830912961848933, 'joint_f1': 0.36950188461400907, 'sp_f1': 0.6090896536879039, 'joint_prec': 0.4142776503762301, 'em': 0.42822417285617825, 'sp_recall': 0.624765441625671, 'prec': 0.589656389075701, 'sp_prec': 0.664514002765185, 'joint_em': 0.09790681971640783}
After the update(V1.1), i got a acceptable result. best_dev_F1 57.81 (on Tesla K40m)
I got best_dev_F1 56.37454881285825 (on 2080ti)
why I can't reach your performance of baseline?