google / storybench

Apache License 2.0
47 stars 3 forks source link

Questions about video downloading & data annotations #5

Open Wuziyi616 opened 1 week ago

Wuziyi616 commented 1 week ago

Hi, thanks for this great work. In the README I see instructions for downloading the training data. However, I wonder where can I download the validation & testing data? In the json files I only see video path formatted like storybench/xxx, which are not links. Can you provide the links for downloading these videos?

EDIT: ok I guess I can download them from the original video datasets. Hopefully the video path matches the original video name

e-bug commented 1 week ago

Hi, yes! The videos are from the original datasets (UVO, Oops and DiDeMo).

Please let me know if you have any issues downloading them, and feel free to share your steps here (for future users).

Wuziyi616 commented 1 week ago

Thanks for your reply. It's actually easier than I thought. I just go to their official websites, and select videos in the annotation json files. The only issue I met is about the DiDeMo dataset -- it's a very old dataset so the dataset website no longer works. Nevertheless, I found this issue reply and followed it to successfully download the raw videos.

More specifically:

I'll leave this issue open in case I meet any issues when using the dataset, if that sounds good.

e-bug commented 1 week ago

Great! You can also download the DiDeMo videos from a Google Drive linked in their repo's README file.

Wuziyi616 commented 1 week ago

Great! You can also download the DiDeMo videos from a Google Drive linked in their repo's README file.

I checked that link first, but I think they only stored 13 missing videos instead of all videos in the drive? Maybe I'm missing smth

Edit: oh interesting, turns out this is a different Github repo, and the one I checked is the release version which has a different Google drive link. But anyways, I guess using a script is easier than downloading lots of videos from a drive folder with gdown, as gdown has a limit of maximum 50 files per folder

Wuziyi616 commented 1 week ago

Again thank you for your prompt reply. I have some questions regarding the task split and annotations and hope you can kindly give me some hint:

  1. What's the difference between story and segment? IIUC, a story may contain multiple segments -- segments are different actions/steps in a story. For example, in a story happening in a swimming pool, segments can be 1) people stand in the pool, 2) someone starts to swim left, 3) he arrives at the border and swims back.
  2. Yet, on Oops and UVO, a video might contain more than one story. For example, UVO test set has 1565 videos but 2613 stories. Am I correct that two stories from one video may have overlap? For example, in data/uvo-test.json, I found two stories both from video -FbSzomWtWw.mp4, and their (start_times, end_times) overlaps -- one is [0, 10.005333], another one is [0.705416, 4.366505]. Their sentence_parts also seem to be describing the same story. My guess is you have 2 annotators per video, and both annotations are taken in the dataset.
  3. I did some check of data/tasks/uvo-test/story_cont.json, and found that some stories only have one segment, e.g. the first two entries I posted below, which are the ones mentioned in question 2. I'm a bit confused because from my understanding, story continuation means we have some initial frames of the first segment, plus captions of more than one segments. But for the below two examples, since they both have only one text, shouldn't they be excluded from the story continuation task? IMO they are just action execution. This actually happens a lot, 1623 out of 2613 stories only have one segment. Should I exclude them from evaluation?
  4. Is the only difference between story continuation and story generation the absence of initial frames? I checked their json files and it seems everything else such as video_name, texts, exact_frames_per_prompt are all the same.
>>> pprint(story_cont[0])
{'background': None,
 'comment': 'UVO_dense_val_100_0',
 'durations': None,
 'exact_frames_per_prompt': [76],
 'indices_to_select': None,
 'npz_gt_video_end_frame': None,
 'npz_gt_video_start_frame': 0,
 'npz_video': 'storybench/npy_96x160pix_8fps/uvo-test/videos/-FbSzomWtWw.npy',
 'npz_video_end_frame': 4,
 'npz_video_start_frame': 0,
 'skip_frames_after_generation': 4,
 'storybench_mode': 'story_cont',
 'texts': ['A man wearing a white t-shirt is sitting behind the table, eating a burger and enjoying it while giving a thumbs up when it tastes good while a person whose hand is visible is holding a fork and picking up the food.']}
>>> pprint(story_cont[1])
{'background': None,
 'comment': 'UVO_dense_val_100_1',
 'durations': None,
 'exact_frames_per_prompt': [25],
 'indices_to_select': None,
 'npz_gt_video_end_frame': None,
 'npz_gt_video_start_frame': 0,
 'npz_video': 'storybench/npy_96x160pix_8fps/uvo-test/videos/-FbSzomWtWw.npy',
 'npz_video_end_frame': 10,
 'npz_video_start_frame': 0,
 'skip_frames_after_generation': 10,
 'storybench_mode': 'story_cont',
 'texts': ['A person, whose hand is visible, is holding a fork and picking some food with it while a man wearing a white t-shirt is sitting and eating food and looking at it.']}
Wuziyi616 commented 1 week ago

Sorry for so many questions but a few more after processing the data in detail:

  1. I noticed some differences between json files of the same story on UVO, for example, between data/uvo-test.json and data/tasks/uvo-test/story_gen.json. Sometimes the background descriptions don't match, sometimes the segment start/end timestamps don't match. See the attached example. Should I always use the one under tasks? On DiDeMo everything matches tho, and I haven't checked Oops.
>>> annotations[322]['background_description']
'In the background, people are speaking. There is a brown surface, brown walls, brown kettles, a white dish, white bowls, brown bowls, a golden object, a bottle, some wooden objects and other miscellaneous items.'
>>> story_gen[322]['background']
'In the background, there are clay teapot, ceramic jar and the cups, and the table.'
Wuziyi616 commented 6 days ago

@e-bug a gentle reminder of the above questions. Thank you!