wilson1yan / VideoGPT

MIT License
962 stars 115 forks source link

How many frames (seconds) are there in each video sample used in the training process? #34

Closed BinZhu-ece closed 2 years ago

BinZhu-ece commented 2 years ago

How many frames (seconds) are there in each video sample used in the training process? What’‘s the video length in the dataset? Did you directly use the complete video or slice the video?

wilson1yan commented 2 years ago

All models are trained on 16 frames, sliced from the original videos. The number of seconds those 16 frames represent depends on the original frame rate of the video, which varies for different datasets.

BAIR robot dataset has shorter videos (I think ~40 frames), while datasets like UCF-101 have much longer videos (30fps, ~5-10 minutes).

BinZhu-ece commented 2 years ago

All models are trained on 16 frames, sliced from the original videos. The number of seconds those 16 frames represent depends on the original frame rate of the video, which varies for different datasets.

BAIR robot dataset has shorter videos (I think ~40 frames), while datasets like UCF-101 have much longer videos (30fps, ~5-10 minutes).

Thank you very much for your reply! Are the 16 frames evenly sampled from the video or in another way ?

wilson1yan commented 2 years ago

The frames are sampled evenly with a stride of 1, i.e. for a video with X frames, 16 consecutive frames are randomly sampled

mw66 commented 10 months ago

The frames are sampled evenly with a stride of 1, i.e. for a video with X frames, 16 consecutive frames are randomly sampled

Can you show the code location of this? Thanks.