wilson1yan / VideoGPT

MIT License
962 stars 115 forks source link

vqvae finetuning procedures #8

Closed Goulustis closed 3 years ago

Goulustis commented 3 years ago

Hi, I'm attempting to finetune the pretrained ucf vqvae to another data set. But the reconstruction immediately deteriorate into something blurry with watery motion very quickly (~400 training steps; batch size=32).

I was wondering if there are any procedures that I need to do to for fintuneing to work (eg. setting codebook._need_init = False)?

Thanks in advance! (sorry to bother you with such minute details :'( )

wilson1yan commented 3 years ago

Do reconstructions go blurry immediately? i.e. after 1 training step? If that's the case, then it should be enough to set codebook._need_init = False. Otherwise I might need to think about what else might be affecting it

f3rhoodn commented 2 years ago

Hi, how did you done the fine-tuning of a pre-trained vqvae model? I want to load a pre-trained UCF101 model and fine tune it with my data. am I doing it correctly?

first I load a pre-trained model using the provided function:

model = load_vqvae('ucf101_stride4x4x4')

and then I ask the trainer to continue from the checkpoint (by setting resume_from_checkpoint flag and providing the address of the model).

trainer = pl.Trainer.from_argparse_args(args, callbacks=callbacks,
                                        max_steps=200000, resume_from_checkpoint="path/to/model//ucf101_stride4x4x4", **kwargs)

trainer.fit(model, data)

is this right? is the first part necessary? (load_vqvae) or only setting resume_from_checkpoint is enough?

thanks in advance

Goulustis commented 2 years ago

You should have the first part, because there is a strange phenomenon where the model isn't getting updated purely with:

resume_from_checkpoint="path/to/model//ucf101_stride4x4x4"

Just a heads up is that, it appears the model is stuck in some local optima, fine tuning didn't really help for me. Love to hear if you see something similar or different.

f3rhoodn commented 2 years ago

Thank you for your reply. Yeah, to make sure, I kept them both! I am going to use the dictionary for other purposes, I am not generating images. So, I cannot tell how it gets affected by fine-tuning. thanks again for your response