Preprocessing code for VIOLIN

Worm4047 commented 3 years ago

Hi, In the repo. create_txtdb.sh is given to create txt DB for TVR. Can you please provide the script which you used to create text DB for violin? Thanks

linjieli222 commented 3 years ago

Hi there,

Thanks for your interests in this project. I have added the prepro function https://github.com/linjieli222/HERO/blob/e6448345249aed036c38e360a17cd00f4675639e/scripts/prepro_query.py#L96

To successfully run prepro and create txt_db

You will need to split the released VIOLIN annotation into train/val/test first and save them into jsonl files, similar to tvqa and tvr annotations. Below is an example of an entry in the resulting jsonl files:
```
 {"vid_name": "BWMFLJwEVyQ_clip_000_040", "desc_id": "BWMFLJwEVyQ_clip_000_040-0-0",
  "desc": "The vampire grabbed the woman in the fur coat and bit her on the neck.", "label": true}
```

Subtitles in VIOLIN also needed to be formatted into similar jsonl file:

{"vid_name": "gt3ntYidpvs_clip_000_040", 
 "sub": [{"text": "one board one minute home free okay make", "start": 0.12, "end": 11.629}, 
            {"text": "it quick", "start": 11.639000000000001, "end": 21.97}, 
            {"text": "you ready yeah wait what are you doing", "start": 21.98, "end": 26.23}, 
            {"text": "show me superiority the senator dead may", "start": 26.240000000000002, "end": 28.990000000000002}, 
            {"text": "drive them back sure sound like a call", "start": 29.0, "end": 40.0}]}

You also need an vid2nframes.json file, which I believe the id2nframe.json in violin video_db can be directly applied here. An example entry in the file:
```
{"dh_s02e23_clip_1451_1476": 17, ...}
```

I believe step 1-3 can be easily done with just a little work. More descriptions about raw VIOLIN annotations are provided here, which may help you with formatting.

Let me know if you have any additional questions.

Thanks.

Worm4047 commented 3 years ago

Hi, I was able to create the text DB, but while running the training code I'm getting an error. File "/src/data/data.py", line 59, in __init__ f'id2nframe.json', "r")) FileNotFoundError: [Errno 2] No such file or directory: '/video/violin/id2nframe.json.

It seems that the video_db (downloaded) is missing this file.

linjieli222 commented 3 years ago

There is id2nframe.json in the downloaded video_db (the image above shows the output from extracting violin.tar). Most probably the extracted files are stored under /video/VIDEO_DB/violin due a mistake in the download script.

Did you pull the latest code? We have fixed the decompress command in the download script.....

Worm4047 commented 3 years ago

Okay, let me check and get back to you. Thanks

Worm4047 commented 3 years ago

I was able to find the file on downloading the DB again. Thanks

linjieli222 / HERO

Preprocessing code for VIOLIN #10