robot-learning-freiburg / MM-DistillNet

PyTorch code for training MM-DistillNet for multimodal knowledge distillation. http://rl.uni-freiburg.de/research/multimodal-distill
GNU General Public License v3.0
59 stars 14 forks source link

Question about dataset structure #8

Closed drydenwiebe closed 3 years ago

drydenwiebe commented 3 years ago

Hello.

Thank you so much for this dataset, it is very large and well thought out!

I have a question about the structure of the dataset. The audio files are in the form: audio/audio__.mp3

When I untar the audio directories they are mostly like this audio/audio_.mp3
but sometimes they are of the form audio/audio
.mp3 where there is another number after the time stamp.

For example in /drive_day_2020_04_14_15_56_26/audio there is audio_0_1586873154_433877998_1.mp3 and audio_0_1586873154_433877998_4.mp3 and when I diff them, they seem to be the same file.

Why is this the case. Can I just ignore all but one when processing the audio?

Thanks!

franchuterivera commented 3 years ago

Hello, thanks for your interest in the project.

To answer your question, you can just use the _0.mp3.

Why?

The format isaudio_<mic_number>_<timestamp>.mp3. We consider 1586873154_433877998_1 as timestamp because it represents the instant of time when all of the modalities are aligned. Going deeper into this timestamp, it actually follows the format <seconds>_<nano_seconds>_<sequence_number>.

The sequence_number is a product of our alignment and recording technique. Our microphone cannot go to this granularity, but other modalities can.

drydenwiebe commented 3 years ago

Thank you for the response!

That makes sense.

hxixixh commented 3 years ago

Thanks for your explanation and it makes a lot more sense. But I'm still confused about what the sequence number is. For example, I get a sequence of frames from the dataset

fl_rgb_1590957096_772363597_0.jpg
fl_rgb_1590957096_772363597_1.jpg 
fl_rgb_1590957096_772363597_2.jpg
fl_rgb_1590957096_772363597_4.jpg
fl_rgb_1590957096_806343405_0.jpg
fl_rgb_1590957096_806343405_1.jpg
fl_rgb_1590957096_806343405_2.jpg
fl_rgb_1590957096_806343405_3.jpg
fl_rgb_1590957096_806343405_4.jpg
fl_rgb_1590957096_840317597_0.jpg

Every single frame is different, and I suppose one unit of the sequence number indicates 1/5 nanoseconds. However, if I generate a video from the frames, they don't seem to be consecutive. fl_rgb_1590957096_772363597_4.jpg appears to be a later frame than fl_rgb_1590957096_806343405_0.jpg