WS-BERT-DUAL not reaching the scores published in the paper

Hello, I am currently doing stance detection research that is based off of this research paper and I'm having a bit of a problem. When running the model with 4 GPUS (a400x4), the results for BERTweet+ increases by 0.2 for Trump, Biden and 0.5 for Sanders (which is normal), but for WS-BERT-DUAl, none of the results gets even close to the results written on the paper. In the paper the results are Trump: 85.8, Biden: 83.5, Sanders 79.0, Average: 82.8. However, when I run it on the server, the results are Trump: 84.2, Biden: 82.7, Sanders: 75.6 averaging at 80.83. Were there changes made to the code? What was the environment you ran on and I'm wondering if there are potential ways to fix this problem.

zihaohe123 / wiki-enhanced-stance-detection

WS-BERT-DUAL not reaching the scores published in the paper #6