happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
420 stars 77 forks source link

How are the feature dimensions of the model aligned with the video duration? #5

Closed xuan301 closed 2 years ago

xuan301 commented 2 years ago

For example, the duration of video_validation_0000051 is 169.79 and the fps is 30. How to get 1269 in its i3d feature dimension (1269,2048)?

tzzcl commented 2 years ago

Hi, for THUMOS14 dataset, we use the feature provided by CMCS. The video has around 5094 frames. The temporal stride is 4, Then we can get (5094 -16) / 4 ~= 1269 feature points for the video.

xuan301 commented 2 years ago

If the video has 20 frames, when the temperal stride is 4, should we get 2 feature points? They are derived from 0-16 frames and 4-20 frames, respectively. But according to (20-16)/4=1, we can only get 1 feature point.

tzzcl commented 2 years ago

The CMCS uses pytorch-i3d-feature-extraction to extract the features. Based on the code, I think we can only get 1 feature point.

xuan301 commented 2 years ago

OK, thanks for your help~

happyharrycn commented 2 years ago

Thanks for resolving this issue, Chenlin. Mark as closed.