Closed kevinlee9 closed 3 years ago
We only load the RoBERTa weights to the 6-layer Cross Modal Transformer. The 3-layer Temporal Transformer are randomly initialized. Please refer to https://github.com/linjieli222/HERO/blob/1534356a7e3edc5258ff63850952929fbb2b1569/model/modeling_utils.py#L46 for more details.
Hi @linjieli222,
Nice work! I has a question about model weights without pretraining. In the paper it says model parameters (w/o pretraining) are initialized with RoBERTa weights. As RoBERTa has 12 layers and HERO as 6/3 layers, I wonder the weights of which layers are loaded in cross-modal Transformer and Temporal Transformer?
Thanks.