About video feature - Githubissues

Hi, thanks for your awesome work! I'm recently working on the VCMR task with this codebase, I downloaded tvr_feature_release.tar.gz following readme.md, and it worked well for me, I get VCMR result (R@1 IoU=0.7) of 7.6. But I have some problems reproducing the metrics with features extracted by myself. I extracted slowfast+resnet features of TVQA raw videos using code in HERO_Video_Feature_Extractor, concatenate them and get a D=4352 visual feature, and tried to train CONQUER using this feature, but I can only get VCMR result of R@1=6.6. I use checkpoint of slowfast downloaded Here and checkpoint of resnet152 downloaded from torchvision, clip_len=3/2. I carefully examined the procedure of feature extraction, and don't find mistakes. So I wonder do you use any postprocess for features extracted by HERO? Or can you give any possible reasons for this? Thanks!

houzhijian / CONQUER

About video feature #3