microsoft / NUWA

A unified 3D Transformer Pipeline for visual synthesis
2.81k stars 163 forks source link

For T2V, is the 10 frames evenly sampled from the video or the first 10 frames in the video? #17

Open BinZhu-ece opened 2 years ago

BinZhu-ece commented 2 years ago

Thank you for your excellent work! From the paper, I know that you sample 10 frames from a 2.5 FPS video. I want to know how many frames per video in the dataset you use? Is the 10 frames evenly sampled from the video or the first 10 frames in the video?