tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.26k stars 1.53k forks source link

Community datasets not working on local env #3106

Closed NikhilBartwal closed 3 years ago

NikhilBartwal commented 3 years ago

For this year's GSOC, TFDS is going forward with community datasets. However, I had some queries regarding it as I would like to contribute to TFDS as part of this year's GSOC.

  1. The community datasets index currently contains only huggingface as part of registered namespaces. https://github.com/tensorflow/datasets/blob/fcd51122d639b6406648912f27ed9a8446986a84/tensorflow_datasets/community-datasets.toml#L4 However, when using tfds.load('huggingface: any_dataset') results in Dataset not found. So, are the community datasets not currently operational?

  2. Also, I was thinking whether the community datasets scripts would be part of TFDS or they will be downloaded on runtime when the user loads it?

@Conchylicultor @vijayphoenix It would be great if the project could be discussed a bit more into detail. Thanks!

Conchylicultor commented 3 years ago

1) Have you tried with tfds-nightly ? You need to replace any_dataset by a valid dataset name 2) The community datasets are installed at runtime. They are not shipped with TFDS

NikhilBartwal commented 3 years ago

@Conchylicultor Community datasets seem to be working successfully with tfds-nightly on colab. However, the same does not work on local environment. Both the TFDS versions were 4.2.0+nightly

Show full Error

``` Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow_datasets as tfds 2021-03-16 17:52:41.327784: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll >>> tfds.__version__ '4.2.0+nightly' >>> ds = tfds.load('huggingface:art') Traceback (most recent call last): File "", line 1, in File "F:\tfds\datasets\tensorflow_datasets\core\load.py", line 325, in load dbuilder = builder(name, data_dir=data_dir, try_gcs=try_gcs, **builder_kwargs) File "F:\tfds\datasets\tensorflow_datasets\core\load.py", line 175, in builder raise not_found_error File "F:\tfds\datasets\tensorflow_datasets\core\load.py", line 155, in builder cls = builder_cls(str(name)) File "F:\tfds\datasets\tensorflow_datasets\core\load.py", line 101, in builder_cls _reraise_with_list_builders(e, name=ds_name) # pytype: disable=bad-return-type File "F:\tfds\datasets\tensorflow_datasets\core\load.py", line 457, in _reraise_with_list_builders raise py_utils.reraise(e, suffix=error_string) File "F:\tfds\datasets\tensorflow_datasets\core\load.py", line 91, in builder_cls return community.community_register.builder_cls(ds_name) File "F:\tfds\datasets\tensorflow_datasets\core\community\register_package.py", line 243, in builder_cls installed_dataset = _download_or_reuse_cache( File "F:\tfds\datasets\tensorflow_datasets\core\community\register_package.py", line 299, in _download_or_reuse_cache raise registered.DatasetNotFoundError( tensorflow_datasets.core.registered.DatasetNotFoundError: Could not find dataset huggingface:art: Dataset not found among the 0 datasets of the community index. Available datasets: - abstract_reasoning - accentdb - aeslc - aflw2k3d - ag_news_subset - ai2_arc - ai2_arc_with_ir - amazon_us_reviews - anli - arc - bair_robot_pushing_small - bccd - beans - big_patent - bigearthnet - billsum - binarized_mnist - binary_alpha_digits - blimp - bool_q - c4 - caltech101 - caltech_birds2010 - caltech_birds2011 - cars196 - cassava - cats_vs_dogs - celeb_a - celeb_a_hq - cfq - cherry_blossoms - chexpert - cifar10 - cifar100 - cifar10_1 - cifar10_corrupted - citrus_leaves - cityscapes - civil_comments - clevr - clic - clinc_oos - cmaterdb - cnn_dailymail - coco - coco_captions - coil100 - colorectal_histology - colorectal_histology_large - common_voice - coqa - cos_e - cosmos_qa - covid19sum - crema_d - curated_breast_imaging_ddsm - cycle_gan - dart - davis - deep_weeds - definite_pronoun_resolution - dementiabank - diabetic_retinopathy_detection - div2k - dmlab - downsampled_imagenet - drop - dsprites - dtd - duke_ultrasound - e2e_cleaned - efron_morris75 - emnist - eraser_multi_rc - esnli - eurosat - fashion_mnist - flic - flores - food101 - forest_fires - fuss - gap - geirhos_conflict_stimuli - gem - genomics_ood - german_credit_numeric - gigaword - glue - goemotions - gpt3 - gref - groove - gtzan - gtzan_music_speech - hellaswag - higgs - horses_or_humans - howell - i_naturalist2017 - imagenet2012 - imagenet2012_corrupted - imagenet2012_real - imagenet2012_subset - imagenet_a - imagenet_r - imagenet_resized - imagenet_v2 - imagenette - imagewang - imdb_reviews - irc_disentanglement - iris - kitti - kmnist - lambada - lfw - librispeech - librispeech_lm - libritts - ljspeech - lm1b - lost_and_found - lsun - lvis - malaria - math_dataset - mctaco - mlqa - mnist - mnist_corrupted - movie_lens - movie_rationales - movielens - moving_mnist - multi_news - multi_nli - multi_nli_mismatch - natural_questions - natural_questions_open - newsroom - nsynth - nyu_depth_v2 - ogbg_molpcba - omniglot - open_images_challenge2019_detection - open_images_v4 - openbookqa - opinion_abstracts - opinosis - opus - oxford_flowers102 - oxford_iiit_pet - para_crawl - patch_camelyon - paws_wiki - paws_x_wiki - pet_finder - pg19 - piqa - places365_small - plant_leaves - plant_village - plantae_k - qa4mre - qasc - quac - quickdraw_bitmap - race - radon - reddit - reddit_disentanglement - reddit_tifu - resisc45 - robonet - rock_paper_scissors - rock_you - s3o4d - salient_span_wikipedia - samsum - savee - scan - scene_parse150 - scicite - scientific_papers - sentiment140 - shapes3d - siscore - smallnorb - snli - so2sat - speech_commands - spoken_digit - squad - stanford_dogs - stanford_online_products - star_cfq - starcraft_video - stl10 - story_cloze - sun397 - super_glue - svhn_cropped - tao - ted_hrlr_translate - ted_multi_translate - tedlium - tf_flowers - the300w_lp - tiny_shakespeare - titanic - trec - trivia_qa - tydi_qa - uc_merced - ucf101 - vctk - vgg_face2 - visual_domain_decathlon - voc - voxceleb - voxforge - waymo_open_dataset - web_nlg - web_questions - wider_face - wiki40b - wiki_bio - wiki_table_questions - wiki_table_text - wikiann - wikihow - wikipedia - wikipedia_toxicity_subtypes - wine_quality - winogrande - wmt13_translate - wmt14_translate - wmt15_translate - wmt16_translate - wmt17_translate - wmt18_translate - wmt19_translate - wmt_t2t_translate - wmt_translate - wordnet - wsc273 - xnli - xquad - xsum - xtreme_pawsx - xtreme_xnli - yelp_polarity_reviews - yes_no - youtube_vis Check that: - if dataset was added recently, it may only be available in `tfds-nightly` - the dataset name is spelled correctly - dataset class defines all base class abstract methods - the module defining the dataset class is imported ```

Conchylicultor commented 3 years ago

Make sure to update your tfds-nightly with pip install --upgrade tfds-nightly

NikhilBartwal commented 3 years ago

@Conchylicultor I'm currently using tfds-nightly==4.2.0.dev202103160106 on my local machine and community datasets does not seem to work on it. I have tried uninstalling TFDS completely and then reinstalling using pip install tfds-nightly but it didn't work. I have also tried installing TFDS from the freshly cloned repo as well. Do you have any idea why this could be happending?

NikhilBartwal commented 3 years ago

@Conchylicultor I think that I have pinpointed the issue. Since the original question has been answered, I think I should open a new issue so as to avoid further confusion for others.