tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.28k stars 1.53k forks source link

Add `tfds list` CLI command #3116

Open vijayphoenix opened 3 years ago

vijayphoenix commented 3 years ago

Addition of community dataset will sharply increase the no of available datasets. So it will become difficult to browse through repos or catalog.

So, it would great if we can add a new CLI command to list all the datasets in present in TFDS. This will allows users to easily search a dataset using tools like grep.

Possible Usage

tfds list # List all datasets
tfds list --type audio # List all audio dataset in TFDS
tfds list --namespace huggingface # List all huggingface community dataset
tfds list --search sun # Lists datasets like sun397, lsun (alternative is to use grep)

Implementation details

• Use/modify the tfds.list_builder API. • Add a new file list.py in scripts/cli folder

jatin-code777 commented 3 years ago

Hi, I have started work on this

NikhilBartwal commented 3 years ago

Since @jatin-code777 has already started working on --type and --exclude_community part, I will be sending a PR for the --search and --namespace flags soon.

Srikeshram commented 3 years ago

I am working on this issue. Will you please assign it to me?

NikhilBartwal commented 3 years ago

Hey @Srikeshram, @jatin-code777 is already working on the issue and taking it up would only lead to work duplication. You can take up other bugs and issues which are still unresolved in TFDS. Thank you for the efforts!

pnmartinez commented 2 years ago

Leaving here my cheers on this feature request!

It'd be a game-changer to be able to dynamically do "dataset version control" easier.