houzhijian / CONE

[2023 ACL] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
28 stars 4 forks source link

mad_clip_text_extractor.py #2

Closed Tanveer81 closed 1 year ago

Tanveer81 commented 1 year ago

The original data for MAD contains MAD_train.json. In the 'mad_clip_text_extractor.py' file you load train.jsonl. Are they the same?

Tanveer81 commented 1 year ago

Do I need to run 'mad_clip_text_extractor.py'. The text features are already available with t he provided dataset.

houzhijian commented 1 year ago

Hi, Tanveer81,

Thanks for your interest in this work.

Regarding the textual features for the MAD dataset, I didn't use the textual features from the MAD authors but extracted the textual cls, and token sequence features via 'mad_clip_text_extractor.py' in this repo. Moreover, I recommend you download the MAD data file (https://drive.google.com/file/d/1DYX_rXn0mjiAx36sdsF3W14D--3XnMQB/view) used by this repo. It contains the textual feature extracted by me and the adopted jsonl files. You are also welcome to use the 'mad_clip_text_extractor.py' code to learn how to extract the feature yourself.

Regarding the train.jsonl file for the MAD dataset, I did some data pre-processing. Please refer to "https://github.com/houzhijian/CONE/blob/main/data/README.md"

Please feel free to let me know if I miss anything.

Tanveer81 commented 1 year ago

Does the Ego-NLQ dataset provided by you have all the pre-processing done? I do not have to run any pre-processing, right?

houzhijian commented 1 year ago

Hi, Tanveer81,

Yes. The provided Ego4D-NLQ zip file contains all the pre-processed data. You do not need to run any pre-processing.

Regarding the MAD dataset, you only need to get the visual features from the MAD authors and use the "./feature_extraction/misc/convert_h5_to_lmdb.py" code to convert the h5 file to the lmdb file.