jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
738 stars 110 forks source link

Can you give a txt file containing all word vectors? #5

Closed AIstudentSH closed 4 years ago

AIstudentSH commented 4 years ago

Thank you for your work, I would love to reference your VL-Bert in my network. Is there a VQA dataset word vector file saved in txt format?

jackroos commented 4 years ago

What do you mean by saying "word vector"? Is it word embeddings? You can get the word embeddings from pre-trained models.

AIstudentSH commented 4 years ago

Sorry, can you introduce that how I can get word embeddings from a pre-trained model. I didn't fully understand your code.

jackroos commented 4 years ago

You can get word embeddings as following:

import torch

path_to_pretrained_model = './model/pretrained_model/vl-bert-base-e2e.model'
checkpoint = torch.load(path_to_pretrained_model, map_location='cpu')
word_embeddings = checkpoint['state_dict']['module.vlbert.word_embeddings.weight']
print(word_embeddings.shape)

You should get output like:

torch.Size([30522, 768])

Hope it helps! @AIstudentSH

jackroos commented 4 years ago

BTW, I don't think it is a good idea that you just use the word embeddings. You'd better use the whole pre-trained model to take full advantage of VL-BERT.