swiss-ai / ml-4m

4M: Massively Multimodal Masked Modeling (NeurIPS 2023 Spotlight)
Apache License 2.0
0 stars 0 forks source link

Transform from v2d format into video_rgb format and save in `video_rgb/` directory #10

Open kdu4108 opened 4 months ago

kdu4108 commented 4 months ago

Goal: given v2d format of

 ├── 00000.tar
 |     ├── 00000.mp4
 |     ├── 00000.txt
 |     ├── 00000.json
 |     ├── 00001.mp4
 |     ├── 00001.txt
 |     ├── 00001.json
 |     └── ...
 |     ├── 10000.mp4
 |     ├── 10000.txt
 |     ├── 10000.json
 ├── 00001.tar
 |     ├── 10001.mp4
 |     ├── 10001.txt
 |     ├── 10001.json
 │     ...
 ...

produce a video_rgb/ modality data folder of the following format:

root/video_rgb/shard-00000.tar
 |     ├── 00000.mp4 # this corresponds to one video.
 |     ├── 00001.mp4
 |     └── ...

Option 1: This should mostly just involve extracting the mp4/video files from the video2dataset format and moving it into the right directory paths.

Option 2: We can use v2d now to normalize the videos by making them same number of frames, etc.

We choose option #2 because by the time we get something in a modality folder, it should already be the last preprocessing step before pseudolabeling for aligned data.

Child issue of #3.

kdu4108 commented 3 months ago

Finished by https://github.com/swiss-ai/ml-4m/pull/17

kdu4108 commented 3 months ago

One thing we overlooked is we actually want to have a directory of the format

root/video_rgb/train/*.tar
root/video_rgb/val/*.tar
root/video_rgb/test/*.tar

So we should modify the script that goes from raw to video_rgb to do this train/val/test split as well.