mx-mark / VideoTransformer-pytorch

PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
272 stars 34 forks source link

Request code for finetune with self-supervised pretrained weights #11

Closed WHlTE-N0lSE closed 2 years ago

WHlTE-N0lSE commented 2 years ago

I tried to do self-supervised experiments with your code, but ran into a lot of problems during the fine-tuning stage. Can you share your MVIT finetune code? Thank you!

mx-mark commented 2 years ago

@WHlTE-N0lSE you can refer the start script at https://github.com/mx-mark/VideoTransformer-pytorch/blob/main/README.md#:~:text=%23%20finetune%20with%20maskfeat,TRAIN_DATA_PATH%20%5C%0A%09%2Dval_data_path%20%24VAL_DATA_PATH

WHlTE-N0lSE commented 2 years ago

Ok! I'll try it.

Enclavet commented 2 years ago

@mx-mark

With the latest code changes, I dont think this combination of hparams works anymore:

-objective 'supervised' \ -arch 'timesformer' \

I'm getting NotImplemented exceptions which appear to be when trying to use: build_finetune_optimizer with arch other than mvit.

Looking backwards this depended on is_pretrain that is passed in from model_trainer.py:

is_pretrain = not (self.configs.objective == 'supervised')

mx-mark commented 2 years ago

@Enclavet oh, thanks for your reply about the problem. There are some code transfer issues that the timesformer and vivit have been classified to pretrain not finetune stage with a moderate finetune recipe before. And we do not adapt the new finetune recipe to these two models.