I'm using the processed data you provided to reproduce the results for MathQA.
Following the instructions, I replace the vocab.txt for bert-base-uncased folder. I use train_ft_monolingual-en.sh. However, the accuracy is around 27%. When I tried math23K instead, the results match the paper.
I wonder where could go wrong. Is there anything specifically for math23k or Chinese dataset that needs modifying when running the code? Or is something about the bert model I need to pay attention to?
I'm using the processed data you provided to reproduce the results for MathQA. Following the instructions, I replace the vocab.txt for bert-base-uncased folder. I use train_ft_monolingual-en.sh. However, the accuracy is around 27%. When I tried math23K instead, the results match the paper. I wonder where could go wrong. Is there anything specifically for math23k or Chinese dataset that needs modifying when running the code? Or is something about the bert model I need to pay attention to?
thanks!