microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
https://arxiv.org/abs/2002.06353
MIT License
339 stars 54 forks source link

How to fine-tune with additional layers before UniVL? #25

Closed CrystalSixone closed 2 years ago

CrystalSixone commented 2 years ago

Hi! Thanks for your awesome work and I am trying to use your pretrained weights to train on another dataset. However, the inputs of my data consist of two different parts and I need to do the attention operation before put them into the pretrained UniVL to finetune. Could you please give me some suggestions on how to fine-tune the model with additional layers before the UniVL? It is like inputs -> additional attention module (random initialization) -> UniVL. I am confused about the training strategy since I have not done the pre-training work before. It will be my great pleasure if you could reply to me :) Best wishes.

ArrowLuo commented 2 years ago

@CrystalSixone, It is an interesting question, but I have no good idea that does not invalidate the pretrained weights. A direct method is to freeze the pretrained weight at the beginning. I find two related discussions on it and maybe useful for you. Link1 and Link2 .

CrystalSixone commented 2 years ago

@ArrowLuo Thanks for your kindly respond! All the best :)