Positive (isNext) output for Next Sentence Prediction might be 0

osainz59 / Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

https://osainz59.github.io/Ask2Transformers/

Apache License 2.0

154 stars 15 forks source link

Positive (isNext) output for Next Sentence Prediction might be 0 #5

Closed emrecncelik closed 2 years ago

emrecncelik commented 2 years ago

Thanks for sharing the code you used in your research, it's really useful!

Before coming across your research, I've seen some other papers using NSP for topic classification, and accuracy of NSP models were almost on par with NLI models (Ma et al. 2021; Sun et al. 2021). So I was surprised to see NSP perform as bad as a random model.

At first I thought this might have happened because of the data you used. However, I saw that you defined default positive output for NSP as 1 in this line. HuggingFace documentation for NSP gives this example:

outputs = model(**encoding, labels=torch.LongTensor([1]))
logits = outputs.logits
assert logits[0, 0] < logits[0, 1]  # next sentence was random

I may be wrong, but I think last line of this example says that the output with index 0 is the positive output (isNext). I am not certain if this is the problem, but I think we should look into that.

Thank you.

osainz59 commented 2 years ago

Hi @emrecncelik ,

Thank you for sharing that point. I was also concerned about the contradictive results that we and other related works obtained. As you pointed, the correct index should be 0 and the current default value on the code is wrong (I am currently working on a complete refactor of the repository, so I will correct it in the future). Still, the experiments were not done with that default value.

To be sure that the obtained results are "correct" (assuming that anything else is wrong) I ran that experiment again with 0 and 1 as the positive index, and I obtained the following results:

Positive index	Top-1	Top-3	Top-5
0 (isNext)	2.01	8.57	16.49
1	1.36	3.57	6.10

The model that I have used is: bert-large-uncased

I noticed that there is a typo on the paper where the reported Top-1 accuracy for the NSP model is 2.07 and the actual Top-1 accuracy is 2.01.

Thank you for your comment!

emrecncelik commented 2 years ago

Oh I see, thanks for the clear explanation. Closing the issue then.

By the way, I'd be happy to help with the improvements in the package if you require any. (eg. Providing more flexible templates for topic classification like the ones in HuggingFace)

osainz59 commented 2 years ago

Thanks!

I think once I have finished the redesign of the repository will be much easier to add improvements, so stay tuned. 👌