Closed bupt-wcm closed 5 years ago
Hi @WangCMing , please check out the tips in the previous issue.
Assume your training details has been kept same as those in Non-Local Network, please double check your inference method. Fully convolutional inference is very important to benchmark the video recognition models.
Recently I notice a related work (Compact-Global-Descriptor) which report the results on UCF101 using ResNet50 + CGNL Module, hope it could help.
Thanks for the great job. I follow the training strategy in the paper to train a I3DResNet50 on ucf101, and the ImageNet pretrained model is used. I sample 64 consecutive frames and drop evenly as the training input and sample 30x32 frames as the testing input. I3DResNet is converted from C2D mentioned in Non-local network. However, I can only get about 70% accuracy. So, can you provide the script about the task of video classification or give some suggestions? Thank you.