okankop / Efficient-3DCNNs

PyTorch Implementation of "Resource Efficient 3D Convolutional Neural Networks", codes and pretrained models.
MIT License
773 stars 149 forks source link

temporal dimention anotations #2

Closed Sushant-aggarwal closed 5 years ago

Sushant-aggarwal commented 5 years ago

what does begin_ index and end_index signifies in the annotations of kinetics dataset in the csv file since i have observed that it's not the frame number neither the time is seconds so what exactly it is and do u temporal cut the videos? thank you

gzoumpourlis commented 5 years ago

Hi @Sushant-aggarwal , The "time_start" and "time_start" columns of the Kinetics .csv files, contain the time index in seconds, that corresponds to the original youtube video. If you use the Kinetics video crawler from the original repository of ActivityNet, you'll notice that the crawler chops the videos so that the final downloaded videos contain only these specific time segments.

Sushant-aggarwal commented 5 years ago

what is the procedure of the temporal crop from each video in UCF-101 dataset. have you randomly take 16 consecutive frames if not how did you take 16 frames from one video?

okankop commented 5 years ago

For all the trainings, TemporalRandomCropping is applied which takes sequential 16 frames from a randomly selected place in the video. Pleasae also check out the downsampling option in the temporal_transforms. You can checkout the implementation of temporal augmentation at "temporal_transforms.py".

Sushant-aggarwal commented 5 years ago

So for all the epochs you selected the same random portion from the video or for each epoch different random consecutive frames are taken from the video?

ahmetgunduz commented 5 years ago

@Sushant-aggarwal It is the latter option. In every batch creation the temporal augmentation is done.So yes in each epoch a new randomly selected portion of the video (consecutive frames) are fed to the network.

okankop commented 5 years ago

I am closing this issue as it has been resolved.