Closed Alextale777 closed 2 years ago
I think you may refer to the https://github.com/Finspire13/pytorch-i3d-feature-extraction repo for details. They add an average pooling operation to turn the [N, C, T, H, W] feature into [N, C] features. Thus, each clip will become a single tensor with [C] shape, and all clips (suppose we have M clips) in a video will become a feature sequence like [M, C].
We use features from I3D after the last average pooling, resulting in a single feature vector of size C for each clip and thus a sequence of features (T x C) for each video. My bet is that the I3D features you have extracted are before the average pooling and thus preserve the spatial dimension.
Let me know if there is any further questions. Otherwise, I will mark the issue as resolved.
Mark as resolved.
I try to use i3d extract features as .npy format.However, the shape of the output is 4D.The shape of your preprocessing dataset is 2D.What should I do next to achieve the right demension?