Closed emrecncelik closed 2 years ago
Hi @emrecncelik ,
Thank you for sharing that point. I was also concerned about the contradictive results that we and other related works obtained. As you pointed, the correct index should be 0 and the current default value on the code is wrong (I am currently working on a complete refactor of the repository, so I will correct it in the future). Still, the experiments were not done with that default value.
To be sure that the obtained results are "correct" (assuming that anything else is wrong) I ran that experiment again with 0 and 1 as the positive index, and I obtained the following results:
Positive index | Top-1 | Top-3 | Top-5 |
---|---|---|---|
0 (isNext) | 2.01 | 8.57 | 16.49 |
1 | 1.36 | 3.57 | 6.10 |
The model that I have used is: bert-large-uncased
I noticed that there is a typo on the paper where the reported Top-1 accuracy for the NSP model is 2.07
and the actual Top-1 accuracy is 2.01
.
Thank you for your comment!
Oh I see, thanks for the clear explanation. Closing the issue then.
By the way, I'd be happy to help with the improvements in the package if you require any. (eg. Providing more flexible templates for topic classification like the ones in HuggingFace)
Thanks!
I think once I have finished the redesign of the repository will be much easier to add improvements, so stay tuned. 👌
Thanks for sharing the code you used in your research, it's really useful!
Before coming across your research, I've seen some other papers using NSP for topic classification, and accuracy of NSP models were almost on par with NLI models (Ma et al. 2021; Sun et al. 2021). So I was surprised to see NSP perform as bad as a random model.
At first I thought this might have happened because of the data you used. However, I saw that you defined default positive output for NSP as 1 in this line. HuggingFace documentation for NSP gives this example:
I may be wrong, but I think last line of this example says that the output with index 0 is the positive output (isNext). I am not certain if this is the problem, but I think we should look into that.
Thank you.