salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.45k stars 193 forks source link

Bug in the token_type_ids #102

Open sanyalsunny111 opened 1 year ago

sanyalsunny111 commented 1 year ago

Dear Authors, there is a bug in the token type ids of the BERT tokenizer as it is adding an extra token which leads to a mismatch in dimensions between input_ids and token_type_ids. I can see that you guys haven't used token_type_ids for pretraining/finetuning so, this bug might not have shown up. Kindly fix this issue.

Here is a minimal code implementation to understand this issue.

image

image

LiJunnan1992 commented 1 year ago

Hi, token_type_ids is unnecessary and thus should not cause any issues. Thanks