piergiaj / pytorch-i3d

Apache License 2.0
985 stars 253 forks source link

Performance of your ported weights on Kinetics400 validation set (data preprocessing) #41

Closed bryanyzhu closed 5 years ago

bryanyzhu commented 5 years ago

Hi @piergiaj , thank you for this great repo. I have used your code to extract I3D features before, it works pretty well. Recently, I want to train it from scratch on Kinetics400/700 and try to reproduce the performance. The first step is to evaluate your model on Kinetics400 validation set to see what is the performance . However, the accuracy is very low.

Then I try to find the reason. The situation is, if I used your .pth model and the demo .npy file, I can get the prediction of CricketShot correctly.

Top classes and probabilities
[0.99999666] [25.856636] playing cricket
[1.3353539e-06] [12.330325] playing kickball
[4.5531763e-07] [11.254369] catching or throwing baseball
[3.1434402e-07] [10.883862] shooting goal (soccer)
[1.9243485e-07] [10.393131] catching or throwing softball

But if I use (imgx/255.)*2 - 1 to preprocess the same video (v_CricketShot_g04_c01) to get the input data, I can't get the correct prediction. The label I get is actually robot dancing. I also tried several more videos, but none of them give me the correct prediction.

Top classes and probabilities
[0.57130724] [10.700808] robot dancing
[0.09980194] [8.956068] pumping fist
[0.04412121] [8.139821] dancing gangnam style
[0.03195825] [7.817311] dancing macarena
[0.01954201] [7.325447] using remote controller (not gaming)

IMO, I think your model is good. Then the reason should be on the data side. It is either my decoded frames are not the same as yours, or the image preprocessing is kind of tricky. Have you encountered this before? I mean did you test your model on the Kinetics400 validation set? Thank you very much and look forward to your reply.

piergiaj commented 5 years ago

I have tested this model on the kinetics-400 validation set and got very similar results to what was reported in the paper. I'm guessing your input preprocessing is slightly different than what this model was trained for. I.e., make sure you're sampling at 25 fps, resizing the height of each video to 256, saving as jpeg, then taking a center crop from that.

You can check your preprocessed video against the provided numpy version until you find the right settings.

bryanyzhu commented 5 years ago

Thank you for your suggestions, I will try it.