hotpotqa / hotpot

Apache License 2.0
445 stars 75 forks source link

why I can't reach your performance of baseline? #6

Closed luckysheep861 closed 5 years ago

luckysheep861 commented 5 years ago

why I can't reach your performance of baseline?

kimiyoung commented 5 years ago

What results did you get? I would suggest deleting the cache files and reruning everything from scratch (make sure you follow the instructions closely). I ran through the process once and found the results reproducible.

vanzytay commented 5 years ago

hi @kimiyoung.

I also ran the entire codebase following the instructions. This was from a clean clone and building of dataset.

I got only a best dev F1 of 56.462817452313814.

I ran this a couple of times after that and it seems like the score is about 56+. EM is about 42.3+.

Any ideas on what might be the cause?

Thanks!

Arjunsankarlal commented 5 years ago

I got slightly better results, best_dev_F1 56.881756072546665. I too did it from scratch

kimiyoung commented 5 years ago

I believe it is a matter of variance. AFAIK, there could be three factors that led to this: 1) The current implementation of our model might be of high variance. 2) According to my previous experiments, the results vary for different types of GPUs even with the same random seed. I used an old Titan X to get these results. 3) I removed 100 training examples (of low quality) in v1.1 from the training set. This would result in some difference such as data batching which is controlled by the random seed.

I would suggest trying different random seeds to study the effects of model variance. Some random seeds might work better.

vanzytay commented 5 years ago

@kimiyoung Thanks for your reply!

Actually I tried both versions (with and without 100). Im guessing maybe it's an issue with system or dependencies. I'll try different seeds.

I have one question though, in your early experiments did you try different optimizers or just defaulted to SGD right from the start?

Thanks!

kimiyoung commented 5 years ago

@vanzytay I did not try other optimizers.

woshiyyya commented 5 years ago

I got a even worse result... best_dev_F1 56.075717925566316, EM is about 42. (I ran the code on 1080 Ti.)

Vimos commented 5 years ago

1080Ti best_dev_F1 57.83286201117724

ag1988 commented 5 years ago

@kimiyoung Thanks for your work. Sure, will try to use other random seeds.

P.S. following are the results from the default run - GPU: Titan Xp , Random Seed: default , Setting: distractor Training (end): best_dev_F1 56.841881121141064

Evaluation: {'sp_em': 0.1950033760972316, 'joint_recall': 0.3910371172630571, 'f1': 0.5661927280885037, 'recall': 0.5830912961848933, 'joint_f1': 0.36950188461400907, 'sp_f1': 0.6090896536879039, 'joint_prec': 0.4142776503762301, 'em': 0.42822417285617825, 'sp_recall': 0.624765441625671, 'prec': 0.589656389075701, 'sp_prec': 0.664514002765185, 'joint_em': 0.09790681971640783}

luckysheep861 commented 5 years ago

After the update(V1.1), i got a acceptable result. best_dev_F1 57.81 (on Tesla K40m)

YeDeming commented 5 years ago

I got best_dev_F1 56.37454881285825 (on 2080ti)