thaolmk54 / hcrn-videoqa

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Apache License 2.0
130 stars 26 forks source link

Motion model information #13

Closed Horizon2333 closed 3 years ago

Horizon2333 commented 3 years ago

Hi,

Firstly let me appreciate your work. Your code is such an elegant one. Unlike other code, you provide the code to extract visual and text features so that it is easier for me to apply your method to my customize datasets and tasks.

Now I'm changing the feature extraction method to improve the performance in my tasks. But I don't know where the motion model ResNeXt-101 is from, which dataset it was pre-trained on and what is the accuracy. So I cannot campare this model with other models directly. If I try them one by one, it will be very time consuming. Could you please tell me some information?

Thanks a lot!

thaolmk54 commented 3 years ago

Hi,

Thank you for your interest in our work.

I used the implementation of ResNeXt-101 from https://github.com/kenshohara/video-classification-3d-cnn-pytorch. The ResNeXt-101 is trained on Kinetics dataset with an accuracy of 65.1 and 85.7 over top-1 and top-5, respectively. Please refer to the original paper for more details https://arxiv.org/abs/1711.09577.

Horizon2333 commented 3 years ago

Thanks a lot!That’s really helpful. Best wishes!