Closed yuanzhedong closed 5 years ago
Hi, You should be able to download the data folder. But you need to download the MEGA free desktop app first and use it to download the data.
Hi @yabufarha , thank you for your reply! Do you have the code to run the I3D feature extraction given a video file? I'm trying to try it on my own dataset but get stuck to extract features. Thank you!
Hi, We used the following repository to extract I3D features: https://github.com/ahsaniqbal/Kinetics-FeatureExtractor
@yabufarha I tried that feature extractor and it works! Thank you! Another question I have is do you use both RGB features and optical flow features? Given a video with length T
, I can get (T/16, 1024)
RGB i3d features and (T/16, 1024)
optical flow features. Did you concatenate those two into (T/16, 2048)
tensor to be the input of MS-TCN?
We used both RGB and optical flow. The default settings generate for a video of length T an output array of size (T, 2048) such that RGB features and flow features are concatenated. For the MS-TCN code, there is a parameter were you set the dimension of the input features. The input to the MS-TCN should be of shape (bz, features_dim, T)
For i3d extractor how do you get embedding for each frame? Seems I have to fill in 16 frames into i3d extractor to get one data point. E.g. the input shape is (1, 16, 224, 224, 3)
and the output shape is (1,1,1,1024)
For each frame we pass a video segment centered at that frame to the I3D model. But the code in the referenced repository already does that. You only need to provide the list of videos.
Got it, thank you so much for the help!
Hi @yabufarha, if you have time could you help me to locate the code to where the video segments are generated in that reference repo, to me it seems just passing the whole video frames and save the embeddings. Here's the code to get all frames for one video: https://github.com/ahsaniqbal/Kinetics-FeatureExtractor/blob/4c50003a1684517106d8f66afbfd588ebae28241/extractor.py#L28 Here's the code to pass the whole video into i3d: https://github.com/ahsaniqbal/Kinetics-FeatureExtractor/blob/4c50003a1684517106d8f66afbfd588ebae28241/extractor.py#L134
Also how long is the video segment do you use to extract per frame embedding?
Actually we used the default settings of 21 frames. Regarding the segment generation code, unfortunately I didn't look into the details. But if you want to extract features, this should not be relevant.
I see, thanks! The reason I'm asking is we want to extract features by using cv2.calcOpticalFlowFarneback
from opencv to compute the optical flow. Basically, we want to rewrite the feature extractor in python. The feature extractor code you shared is very helpful, but we are struggling with how to feed the data into i3d model. For example, if we feed with rgb input with batch _size = 1 and 21 frames, the input dim is (1, 21, 224, 224, 3)
, but the output dim I get is (2, 1, 1, 1024)
. Which means for each frame in the original video, we will get a 2 x 1024
as rgb feature, instead of 1 x 1024
. Turns out the maximum temporal window size we can have is 16. If we fill with (1, 16, 224, 224, 3)
, we will get (1, 1, 1, 1024)
feature as expected. Not sure which part we're missing, but your response is really helpful, thank you so much!
You are right about the output dimension. Nevertheless, the code is based on the I3D paper and an average pooling is applied on the temporal dimension of the output. And that's how you get always 1x1024 features vector for each modality even for larger temporal window - by averaging over the temporal dim.
Average pooling over the temporal dim makes a lot of sense, thank you!!
Hi, thank you for the great work! I have trouble to download the data folder on MEGA, seems I need to pay for it. I'm wondering if you did any preprocessing? Will it still work if I download the dataset from their official websit? Thanks!