tobyperrett / trx

Temporal-Relational CrossTransformers (CVPR 2021)
107 stars 23 forks source link

Performance on Kinetics is only achieve 78.5% #1

Closed ShiyeLi closed 3 years ago

ShiyeLi commented 3 years ago

I use your code as suggestion(replace dataload with mine,use random sample and randomw flip as augmentation,size of image is 3112112),but accuracy(5 way 5 shot) only achieve 78.5% after 10000 meta-train, and 76.9% after 50000 meta-train. It's far away from 85.9%.

Are there any other importance process in your dataload?

ps: loss function in code seems to be wrong(will increase to Nan as iteration),so I replace it with a common CrossEntropy Loss.

tobyperrett commented 3 years ago

Hi. First of all, it looks like you're using a different image size. For Kinetics I scaled all videos to have height of 256 whilst maintaining the aspect ratio. I used random crops of 224x224. To subsample frames, select a random start and random end point which are close the the start and end of the video, and linearly interpolate frames between them (we used 8 frames per video). Remember that any transforms have to be applied to all frames from the video equally - you can't just use pytorch random crop on all frames from a video separately, as there won't be crop consistency between frames (and thus the tuple matching won't work). The loss works fine for me - I've never had any nans or anything like that. Might be worth checking that you're using a resnet 50 backbone with trans_linear_in_dim=2048, trans_linear_out_dim=1152, averaging the loss every 16 batches before backprop, SGD, lr=0.001, and using 5 queries per class. Hope that helps!

Shunli-Wang commented 2 years ago

Hello, it seems that I meet the same problem on the Kinetics-100 dataset, my retraining result is slightly lower than the results in the original paper (85.9):

84.4+/-0.3
83.9+/-0.3
84.0+/-0.3
84.4+/-0.3

I follow all the same settings in this repo. Could you please provide the pre-trained model of the model? (85.9) Thank you very much!