yabufarha / ms-tcn

Other
214 stars 58 forks source link

dimensionality of input features #3

Closed djr2015 closed 5 years ago

djr2015 commented 5 years ago

I noticed the dimensionality of your input I3D features for each video is (2048,number of video frames).

I am confused how the temporal dimension of your inputs is equal to the number of frames as the I3D network is supposed to temporally downsample by a factor of 8? Can you provide more details on how you obtained the I3D features?

yabufarha commented 5 years ago

Hi,

We extracted frame-wise features by defining a temporal window around each frame and passing this window to the I3D network. For more details please check this repository: https://github.com/ahsaniqbal/Kinetics-FeatureExtractor

Best, Yazan

djr2015 commented 5 years ago

Thanks for the indication, I have proceeded in the same manner and can now obtain per-frame category labels using your code!

So far I have been using that repo's default value of 3 (which is considerably smaller than the length of the activities in the dataset I am using), I was wondering if you chose a more principled temporal window size based on your datasets?

yabufarha commented 5 years ago

Actually I used a temporal window of 21 frames. The datasets that I work with are either 15 or 30 fps. The 21 frames window works well for these datasets but I didn't study the effect of using different window size.

djr2015 commented 5 years ago

Thanks! I guess their README is inconsistent with the default value of 21 they have in their code.