Low Accuracy on Eval dataset

ankurgarg101 commented 4 years ago

I tried running the provided implementation for the multi-way model on my system and the best accuracy it achieves on the evaluation dataset is 27.44%. I ran the script without any modifications using the learning rate of 5e-5 with a batchsize of 8 on a single GPU. I did not however pre-train the model using SWAG or RACE datasets.

Are these numbers expected since they seem to be far off from the numbers quoted in the paper? Or is the difference in performance due to lack of pre-training the model with RACE and SWAG datasets? Any insights on this would be helpful.

wilburOne commented 4 years ago

Hi, you need to fine-tune it with different sets of parameters, e.g., learning rate, batchsize, etc to find the optimal set of parameters. We released the multiway attention model here https://github.com/wilburOne/cosmosqa/

pasinit commented 4 years ago

adding on this, it is not clear (at least to me) how the baselines methods in the paper are implemented. Especially, what do you feed to the classifier? Do you simply feed the CLS embedding or something else?

Thanks

theblackcat102 commented 4 years ago

@wilburOne would you mind just share the sets of parameters which is the best optimal ones?

wilburOne / cosmosqa

Low Accuracy on Eval dataset #3