happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
420 stars 77 forks source link

How can I use kenetics-i3d to get the video features #17

Closed Alextale777 closed 2 years ago

Alextale777 commented 2 years ago

I try to use i3d extract features as .npy format.However, the shape of the output is 4D.The shape of your preprocessing dataset is 2D.What should I do next to achieve the right demension?

tzzcl commented 2 years ago

I think you may refer to the https://github.com/Finspire13/pytorch-i3d-feature-extraction repo for details. They add an average pooling operation to turn the [N, C, T, H, W] feature into [N, C] features. Thus, each clip will become a single tensor with [C] shape, and all clips (suppose we have M clips) in a video will become a feature sequence like [M, C].

happyharrycn commented 2 years ago

We use features from I3D after the last average pooling, resulting in a single feature vector of size C for each clip and thus a sequence of features (T x C) for each video. My bet is that the I3D features you have extracted are before the average pooling and thus preserve the spatial dimension.

happyharrycn commented 2 years ago

Let me know if there is any further questions. Otherwise, I will mark the issue as resolved.

happyharrycn commented 2 years ago

Mark as resolved.