Closed yhcc closed 2 years ago
No, all the models are in their large versions. BCE stands for Binary Cross Entropy.
Thanks for you timely reply. So, why this ``wl + RoBERTa'' is 80.72, while in Table 1, it is 81.0 ~
Table 2 shows performance on the development set, Table 1 uses the test set.
Sorry for this silly question. But for my reproduction, your model usually gets higher performance in the development set. The highest performance I got in the development set is about 81.6. Thanks again for your answering, I will close this issue.
Well, it depends heavily on the initialization. I used the same random seed for all my experiments, probably there is a better one out there :)
Thank you for your time, but are all models in Table 2(as depicted in the following) in their base version (it is quite surprise that the RoBERTa-base got 80.72, while RoBERTa large got 81.0)?