[GSoC] Support more datasets format for visualization with tfds.show_examples()

Conchylicultor commented 4 years ago

Task: Currently tfds.show_examples() only works for supervised images datasets. It would be good to extend the heuristic to more dataset types, like:

object detection
2-images side by side (ex: div2k, cityscapes)
video (generated as gifs) ?
texts ?
audio ?

Instructions:

Update show_examples visualization.py to support new dataset types of your choice.
Experiment on Colab to iterate with the visualization, link the Colab to the PR so reviewer can visualize the result
Update the visualization_test.py with dummy data to test the visualization of the new dataset.
Only add a single dataset type at a time

As multiple data-type exists, multiple people can work on this issue at the same time

dhirensr commented 4 years ago

@Conchylicultor : working on translate type datasets. Edit : I have created the changes but not yet sent a PR, could you please review in Colab https://colab.research.google.com/drive/1LDXsE2tAxbn8qnhqpxUnXVpzltpSiYts the show_examples for translate type datasets. I am just printing the texts as they can't really be visualized in the figure like images. @Conchylicultor : Please review PR https://github.com/tensorflow/datasets/pull/1547

vinayvr11 commented 4 years ago

Hello I would like to contribute on this issue. This is my first time so could you please guide me. So I only need to make a visualisation of these formats by implementing some updates in visualization_test.py.

vinayvr11 commented 4 years ago

@Conchylicultor : I am working on coco object detection dataset but this dataset was to big to load in my colab so can i use another dataset apart from tensorflow datasets because tensorflow object detection datasets are too big.

Conchylicultor commented 4 years ago

@vinayvr11 Why don't you have a look at our catalog which display the download size ? For instance voc/2007 is about 1GB, https://www.tensorflow.org/datasets/catalog/voc

VaranRohila commented 4 years ago

Hi, I was building code for support of object detection datasets but I got stuck at the values of the Bbox feature. They all are in between 0 and 1 and I checked in the source that it is meant to be that. I am just confused on what scale should I use while plotting the Bounding Boxes?

VaranRohila commented 4 years ago

I figured it out. Here is the notebook link. @Conchylicultor Please review the changes.

Conchylicultor commented 4 years ago

@VaranRohila nice, this looks great. Could you send a PR so I can see and review the code ?

Edit: Oups, I missed the one you sent. Thank you!

VaranRohila commented 4 years ago

I just did! Link

Conchylicultor commented 4 years ago

Just saw this. Thank you!

harshitadd commented 4 years ago

Hi, Is anyone currently working on audio data visualization - eg. groove If no, I would be grateful to get any guidance on what features should the visualization entail ( should the output be a few random samples of audio or a visual representation of the audio dataset diversity ( classes, frequency, range of audio ), etc ) edit - both ljspeech and librispeech seem to be inaccessible owing to still being in development phase

Conchylicultor commented 4 years ago

@harshitadd Thank you for looking into this! I don't think anyone is working on audio yet.

For the output, I think both image and audio representation could be helpful, but you can start with anything. IPython.display.Audio might be helpful to display audio. Also have a look at my comment in: https://github.com/tensorflow/datasets/pull/1639#discussion_r391848285 to try to factorise this new feature in independent classes.

both ljspeech and librispeech seem to be inaccessible owing to still being in development phase

What do you mean "development phase", the datasets statistics are available on our website which seems to indicates that the data were generated successfully https://www.tensorflow.org/datasets/catalog/librispeech. Are you using the last TFDS version ? If there is an issue with those datasets, please report a bug.

harshitadd commented 4 years ago

@harshitadd Thank you for looking into this! I don't think anyone is working on audio yet.

For the output, I think both image and audio representation could be helpful, but you can start with anything. IPython.display.Audio might be helpful to display audio. Also have a look at my comment in: #1639 (comment) to try to factorise this new feature in independent classes.

both ljspeech and librispeech seem to be inaccessible owing to still being in development phase

What do you mean "development phase", the datasets statistics are available on our website which seems to indicates that the data were generated successfully https://www.tensorflow.org/datasets/catalog/librispeech. Are you using the last TFDS version ? If there is an issue with those datasets, please report a bug.

@Conchylicultor Thank you for your prompt reply - With respect to the latter comment - I am running tfds version 2.0.0 and using librispeech as an argument to tfds.load() gives the error - "Dataset librispeech is under active development and is not available yet". 'ljspeech' simply returns - dataset not found. Kindly advise, If there are no other alternatives that I may use to load them: I will report it as a bug.

As for the former comment - I shall try and include a colab test file link for code review as soon as possible. Screenshot from 2020-03-14 01-20-43

Thanks!

Conchylicultor commented 4 years ago

Did you try with tfds 2.1.0 or tfds-nightly ?

harshitadd commented 4 years ago

Did you try with tfds 2.1.0 or tfds-nightly ?

Thanks a lot, Both ljspeech and librispeech work with tfds-nightly ( not tfds 2.1.0 though ).

Here is the first draft that encapsulates the visual and audio displays of passed dataset. Kindly review the generated output format - If this is what is required - I shall further optimize the code and submit it for review.

A few questions/notes regarding this output:

The code fails when displaying the IPython objects for ljspeech (although the audio numpy representations appear to be valid).
Also, Due to storage constraints - None of the remaining datasets (nsynth, librispeech,speech_command) could be loaded for testing - Is there an alternative way I could attempt to load them and test them on colab itself?
Due to the audio embeddings in the output - The colab file is becoming quite heavy (ljspeech's output is not included in the file for this reason. Can you guide me towards any more efficient output formats that I may look into ?

Thanks!

Conchylicultor commented 4 years ago

@harshitadd, thank you for the quick implementation. This looks nice. Can you send a PR so I can comment on the code ?

Not sure how to help without more info (stacktrace, code,...)
If you have space on your local disk, you could install them locally and test in a local jupyter notebook. Alternatively, there is GCS, but which may be more complicated to set-up https://www.tensorflow.org/datasets/gcs
For the output being heavy, would it make sense to truncate the audio to the first 20sec (at least for the audio file). For ex: add a kwargs in tfds.show_examples(). You can assume some fixed sample rate if not known.

Eshan-Agarwal commented 4 years ago

@Conchylicultor I am working for showing examples for video datasets

harshitadd commented 4 years ago

@harshitadd, thank you for the quick implementation. This looks nice. Can you send a PR so I can comment on the code ?

Thanks a lot for the inputs! I have made a PR for the same - Linkhttps://github.com/tensorflow/datasets/pull/1683 Truncating the audio to a certain length definitely makes the file size quite manageable, so I have added a basic implementation for the same.

Edit - I couldn't fix the LJspeech bug - Would be very grateful if you could direct where the code is flawed in that respect too.

tensorflow / datasets

[GSoC] Support more datasets format for visualization with tfds.show_examples() #1528