Open flaviomerenda opened 1 year ago
In the past, reproducibility issues typically arose due to:
Hi @rdenaux,
Thank you for your quick response and help. Regarding the points you discussed:
claimencoder
achieves 83% accuracy on STS-B and the worthinesschecker
achieves 95% accuracy on its test set. In addition, we also tried to test the claimneuralindex
with some sentence similarity test call.The results currently achieved for each dataset are:
As far as you know, is it possible that there is some other point in the pipeline that can cause such errors?
The repository seems to contain several errors partially fixed by the merge request #1. However, despite the new modifications that guarantee the reproducibility of the scripts, the results obtained during the evaluation are lower than the ones reported in the paper, especially for coinfo250 and FakeNewsNet. @rdenaux do you have any ideas and do you know where the problem could be within the pipeline?