kaiyuyue / cgnl-network.pytorch

Compact Generalized Non-local Network (NIPS 2018)
https://arxiv.org/abs/1810.13125
MIT License
259 stars 41 forks source link

Accuracy only ~70% on ucf101. #6

Closed bupt-wcm closed 5 years ago

bupt-wcm commented 5 years ago

Thanks for the great job. I follow the training strategy in the paper to train a I3DResNet50 on ucf101, and the ImageNet pretrained model is used. I sample 64 consecutive frames and drop evenly as the training input and sample 30x32 frames as the testing input. I3DResNet is converted from C2D mentioned in Non-local network. However, I can only get about 70% accuracy. So, can you provide the script about the task of video classification or give some suggestions? Thank you.

kaiyuyue commented 5 years ago

Hi @WangCMing , please check out the tips in the previous issue.

Assume your training details has been kept same as those in Non-Local Network, please double check your inference method. Fully convolutional inference is very important to benchmark the video recognition models.

Recently I notice a related work (Compact-Global-Descriptor) which report the results on UCF101 using ResNet50 + CGNL Module, hope it could help.