salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

Why get_special_tokens_mask appending a [1] at the end while build_inputs_with_special_tokens does not append a [SEP] at the end for a single input sequence ? #75

Open zhihuacc opened 2 years ago

zhihuacc commented 2 years ago

Hi, I found in this line[build_inputs_with_special_tokens](https://github.com/salesforce/ALBEF/blob/75376bee33df87af9c206b4afb53c876927e7b2b/models/tokenization_bert.py#L294) the returned list is appended a [1] at the end for a single input sequence, while the returned list [here](https://github.com/salesforce/ALBEF/blob/75376bee33df87af9c206b4afb53c876927e7b2b/models/tokenization_bert.py#L262) is NOT appended a [SEP] for the same case. Why is that ?

LiJunnan1992 commented 2 years ago

We remove [SEP] for a single sentence input because it has negligible effect on pre-training.

zhihuacc commented 2 years ago

But why get_special_tokens_mask still appends a [1]. I thought this [1] is for [SEP], right ?

LiJunnan1992 commented 2 years ago

Yes you are right, I have modified the code so that the [1] is not appended. Thank you!