tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.27k stars 1.53k forks source link

[data request] Holistic Video Understanding Dataset (HVU) #2175

Open bhack opened 4 years ago

bhack commented 4 years ago

Folks who would also like to see this dataset in tensorflow/datasets, please thumbs-up so the developers can know which requests to prioritize.

And if you'd like to contribute the dataset (thank you!), see our guide to adding a dataset.

/cc @alidiba67 @MohsenFayyaz89

bhack commented 4 years ago

@MohsenFayyaz89 Do you plan to contribute your dateset?

MohsenFayyaz89 commented 4 years ago

Would like to but unfortunately, I'm quite busy.

bhack commented 4 years ago

Too bad, it would have been useful for those who want to use your dataset but are too busy to create the adapter classes :stuck_out_tongue_winking_eye:

bhack commented 4 years ago

/cc @raviddoss do you know anyone internal that could be interested to take in charge this?

Naman-Bhrgv commented 4 years ago

@bhack I would like to contribute to this issue. Please assign this to me.Also, since I am new to this repo, please guide me through.

bhack commented 4 years ago

@Naman-Bhrgv Please start with standard guide https://www.github.com/tensorflow/datasets/tree/master/docs%2Fadd_dataset.md

Naman-Bhrgv commented 4 years ago

@bhack Thanks!

NikhilBartwal commented 4 years ago

@Naman-Bhrgv Are you currently working on this? @bhack I would like to take this up. Could you provide me with the current status?

Naman-Bhrgv commented 4 years ago

@NikhilBartwal I am currently not working on this.If you want you can take up this issue.

bhack commented 4 years ago

Ok. @NikhilBartwal do you want to prepare a PR?

NikhilBartwal commented 4 years ago

@bhack I would very much like to. As I'm new to this, I wanted to know if you could guide me through the process as the dataset rather contains YouTube video IDs, so I'm not sure about how to prepare the dataset after downloading the video IDs in TFDS. Could you help me through ? Thanks!

bhack commented 4 years ago

For the video set you can take an overview to: https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/video

NikhilBartwal commented 4 years ago

@bhack Thanks for the help! I will look into some examples and start working on it.

NikhilBartwal commented 4 years ago

Hey @bhack @ChanchalKumarMaji @Conchylicultor , I was trying out this script https://github.com/holistic-video-understanding/HVU-Downloader/blob/master/HVU_download.py and what i found was that no matter if I used the script directly from terminal for downloading or integrated the script code in my notebook, there were always sosme files which were not downloaded properly (around ~10% and were around 252 bytes). The script uses joblib.parallel for parallelisation, What do you think could be causing that?

bhack commented 4 years ago

@NikhilBartwal Have you tried to open a ticket at https://github.com/holistic-video-understanding/HVU-Downloader/?

NikhilBartwal commented 4 years ago

@bhack I have just opened one at https://github.com/holistic-video-understanding/HVU-Dataset/issues/3

NikhilBartwal commented 4 years ago

@bhack Hey, there was one doubt that I was having, what do you think would be the most efficient way of decoding the video to a numpy array inside a script?

bhack commented 4 years ago

numpy or Tensor? Have you seen https://www.tensorflow.org/datasets/api_docs/python/tfds/features/Video

https://www.tensorflow.org/io/api_docs/python/tfio/experimental/ffmpeg/decode_video

NikhilBartwal commented 4 years ago

@bhack Guess it was a typo :( I will have a look at it. Thanks !

NikhilBartwal commented 4 years ago

@bhack Well i tried it and looks like tensorflow-io only supports FFmpeg on Ubuntu 14.04, 16.04, and 18.04. as mentioned here Do you have any other idea?

bhack commented 4 years ago

I think that Tensorflow IO is what the other video datasets are using in this repo /cc @yongtang

NikhilBartwal commented 4 years ago

@bhack I checked the video datasets and unfortunately, I couldn't find them using tfio. Could you give a link to it?

yongtang commented 4 years ago

Well i tried it and looks like tensorflow-io only supports FFmpeg on Ubuntu 14.04, 16.04, and 18.04. as mentioned here Do you have any other idea?

@NikhilBartwal Do you have a specific platform (such as Ubuntu 20.04?) you want to use?

bhack commented 4 years ago

@Naman-Bhrgv I supposed that they was reusing TF-IO API but it seems that they are controlling ffmpeg subprocess: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/features/video_feature.py#L116

NikhilBartwal commented 4 years ago

@yongtang I was working on a video dataset for TF-DS and so i was hoping for a platform independent way of decoding a video file.

NikhilBartwal commented 4 years ago

@bhack I will have a look at it. Thanks!

NikhilBartwal commented 4 years ago

@bhack @Conchylicultor Could you review the PR?

bhack commented 4 years ago

@NikhilBartwal thank you for the PR. I am on vacation. It could be nice if a TFDS maintainer could do a first pass in the meantime.

NikhilBartwal commented 4 years ago

@bhack I didn't know that. Sorry for the disturbance :(