jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
738 stars 110 forks source link

type_vocab_size of pretrained model should be 2 #65

Closed whqwill closed 3 years ago

whqwill commented 3 years ago

type_vocab_size of pretrained model should be 2, right? But it shows 3. For my understanding, there are only two types in the pretraining: one for texts and one for images. So do I miss something?

jackroos commented 3 years ago

Since in some task, the text contains two sentences, for example, question and answer in VQA and VCR, so we use different segment embedding following BERT.