r2plus1d_34_8_ig65m with 16 frames input

moabitcoin / ig65m-pytorch

PyTorch 3D video classification models pre-trained on 65 million Instagram videos

MIT License

265 stars 30 forks source link

r2plus1d_34_8_ig65m with 16 frames input #39

Closed jaypatravali closed 2 years ago

jaypatravali commented 4 years ago

Hi I am new to training video models. I have been reading papers which work on action recognition using new models like R3D, R2plus1D with 16 frame inputs. Is there a way to use the R2plus1D_34 using IG65 pretrained weights and finetune it on kinetics400 dataset.

daniel-j-h commented 4 years ago

The pre-trained model weights are available in 8 and 32 frame versions. I don't know if fine tuning with 16 frames will work - give it a try and please post back your findings here, thanks! 🤗

On July 14, 2020 8:54:00 AM UTC, Jay Patravali notifications@github.com wrote:

Hi I wish to train the 34 layer R2plus1D using IG65 pretrained weights and finetune it on kinetics400 dataset. Is it okay to use 16 frame RGB inputs?

bjuncek commented 4 years ago

@jaypatravali should work relativelly well. Du had an ablation in his original r2+1d paper of models pre-trained on 8/32 and evaluated on 32 frames and the accuracy drop is negligable. See https://arxiv.org/pdf/1711.11248.pdf, table3

jaypatravali commented 4 years ago

@daniel-j-h @bjuncek thanks for your prompt replies 😄 . I ll try out training 16 frame RGB video inputs using r2plus1d_34_32_IG65 pretrained weights. I ll let you know how it turns out

Yueeeeee-1 commented 4 years ago

@jaypatravali hi, can you tell me the result? how's it going?

cyy53589 commented 4 years ago

@jaypatravali Hi,It don't work well with my work. How about you? can you tell me the result?