linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
https://arxiv.org/abs/2005.00200
MIT License
230 stars 34 forks source link

model weights without pretraining #9

Closed kevinlee9 closed 3 years ago

kevinlee9 commented 3 years ago

Hi @linjieli222,

Nice work! I has a question about model weights without pretraining. In the paper it says model parameters (w/o pretraining) are initialized with RoBERTa weights. As RoBERTa has 12 layers and HERO as 6/3 layers, I wonder the weights of which layers are loaded in cross-modal Transformer and Temporal Transformer?

Thanks.

linjieli222 commented 3 years ago

We only load the RoBERTa weights to the 6-layer Cross Modal Transformer. The 3-layer Temporal Transformer are randomly initialized. Please refer to https://github.com/linjieli222/HERO/blob/1534356a7e3edc5258ff63850952929fbb2b1569/model/modeling_utils.py#L46 for more details.