ikuinen / CMIN_moment_retrieval

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos
86 stars 20 forks source link

Could you share your extracted features? #4

Closed Xun-Yang closed 4 years ago

Xun-Yang commented 4 years ago

Hi Zhijie

Could you share your extracted features of the two datasets to me? (maybe using google grive or baidu drive)

ikuinen commented 4 years ago

You can download the features of the ActivityNet dataset here.

duskybomb commented 4 years ago

@Xun-Yang you can download the C3D features for TACoS from here

yangwf1 commented 4 years ago

@duskybomb thanks for your sharing, when running the code on tacos datastet, i got the error "s23-d34.npy" not found. I look into the dataset and found that the naming format is confusing with "s23-d34.avi_1117_1181.npy, s23-d34.avi_1123_1379.npy", any suggestions on how to process this dataset?

wangwen39 commented 4 years ago

The provided C3D features of TACoS (" Interval64_128_256_512_overlap0.8_c3d_fc6 ") seem not corresponding to the paper's descriptions: "we define continuous 16 frames as a unit and each unit overlaps 8 frames with adjacent units". So, could you please provide the extracted features of TACoS used in your papers.

wangwen39 commented 4 years ago

I also confused about the C3D features of ActivityNet in your codes: feats = load_feature(self.feature_path, vid, dataset='ActivityNet') fps = feats.shape[0] / duration As far as I am concerned, feats.shape[0] is the number of the feats in one video, but they are 50% overlapped (as described in your paper), So Is there something wrong with my understanding? Thanks for your reply.

ikuinen commented 4 years ago

The provided C3D features of TACoS (" Interval64_128_256_512_overlap0.8_c3d_fc6 ") seem not corresponding to the paper's descriptions: "we define continuous 16 frames as a unit and each unit overlaps 8 frames with adjacent units". So, could you please provide the extracted features of TACoS used in your papers.

try this one.

ikuinen commented 4 years ago

I also confused about the C3D features of ActivityNet in your codes: feats = load_feature(self.feature_path, vid, dataset='ActivityNet') fps = feats.shape[0] / duration As far as I am concerned, feats.shape[0] is the number of the feats in one video, but they are 50% overlapped (as described in your paper), So Is there something wrong with my understanding? Thanks for your reply.

The representations of each frame come from a clip context.