innat / VideoMAE

[NeurIPS'22] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Apache License 2.0
15 stars 3 forks source link

Methods for reconstructing video frames #4

Open pangzhiyu opened 6 months ago

pangzhiyu commented 6 months ago

May I ask if the final reconstructed video frame rate is the same as the original video frame rate between time-series downsampling? If they are the same, what is the method used to reconstruct frames that were discarded during the downsampling stage?

innat commented 6 months ago

The reconstruction approach is shown in this file. Let mke if you have query afterward.

pangzhiyu commented 6 months ago

Thank you for your explanation. It seems that the reconstructed video has not been restored to the frame rate before temporal sampling, right? Just reconstructed the parts that were masked within the frame.