Open ankurgarg101 opened 4 years ago
Hi, you need to fine-tune it with different sets of parameters, e.g., learning rate, batchsize, etc to find the optimal set of parameters. We released the multiway attention model here https://github.com/wilburOne/cosmosqa/
adding on this, it is not clear (at least to me) how the baselines methods in the paper are implemented. Especially, what do you feed to the classifier? Do you simply feed the CLS embedding or something else?
Thanks
@wilburOne would you mind just share the sets of parameters which is the best optimal ones?
I tried running the provided implementation for the multi-way model on my system and the best accuracy it achieves on the evaluation dataset is 27.44%. I ran the script without any modifications using the learning rate of
5e-5
with a batchsize of 8 on a single GPU. I did not however pre-train the model using SWAG or RACE datasets.Are these numbers expected since they seem to be far off from the numbers quoted in the paper? Or is the difference in performance due to lack of pre-training the model with RACE and SWAG datasets? Any insights on this would be helpful.