snap-stanford / GreaseLM

[ICLR 2022 spotlight]GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
MIT License
228 stars 40 forks source link

[Help] About the hyper-parameters to reproduce the result #6

Closed yeeeqichen closed 2 years ago

yeeeqichen commented 2 years ago

@XikunZhang @michiyasunaga @roks

Hi,

Thanks for your great effort!

I've run the code in this repo with the same hyper-parameters provided in the script run_greaselm.sh, which are also the same as reported in the paper. But the results aren't as good as reported in the paper. For example, in csqa, the reported dev_acc and test_acc are 78.5(+-0.5) and 74.2(+-0.4) respectively, but the model I trained only performs 77.48 and 73.01 respectively.

I've tried several random seeds, but the problem still exists. So could you please release the hyper-parameters(i.e. random seed) that you used when you train the model?

Look forward to your response!

XikunZhang commented 2 years ago

Thanks for your interest in our work! I am sorry to hear about your difficulty in reproducing the results.

Unfortunately, the randomness is not fully controlled in our code. The only sources of randomness we control are these, but PyTorch still has other sources of randomness, which means that even if you use the same random seeds as us, you still won't get the exact same numbers. Also from my personal experience, using different types of GPUs is another source of randomness.

Lmk if we can be further helpful in any kind.

yeeeqichen commented 2 years ago

Thanks for your response!