tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.23k stars 1.52k forks source link

[Stanford_dogs] No examples yielded #1781

Closed Flowhill closed 4 years ago

Flowhill commented 4 years ago

Short description When performing the following snippet of code: tfds.load("stanford_dogs")

the error: AssertionError("No examples were yielded.") is thrown

Environment information

Reproduction instructions

import tensorflow_datasets as tfds
tfds.load("stanford_dogs")

Link to logs _[1mDownloading and preparing dataset stanford_dogs/0.2.0 (download: 778.12 MiB, generated: Unknown size, total: 778.12 MiB) to C:\Users\Flowhill\tensorflow_datasets\stanford_dogs\0.2.0... Dl Completed...: 0 url [00:00, ? url/s] Dl Size...: 0 MiB [00:00, ? MiB/s] Dl Size...: 0 MiB [00:00, ? MiB/s]

Dl Completed...: 0 url [00:00, ? url/s] Dl Completed...: 0 url [00:00, ? url/s] Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s] Extraction completed...: 0 file [00:00, ? file/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Dl Completed...: 0 url [00:00, ? url/s] Shuffling and writing examples to C:\Users\Flowhill\tensorflow_datasets\stanford_dogs\0.2.0.incomplete37A45O\stanford_dogs-train.tfrecord Traceback (most recent call last): File "", line 1, in File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec return fn(args, kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\registered.py", line 305, in load dbuilder.download_and_prepare(download_and_prepare_kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec return fn(args, kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 340, in download_and_prepare download_config=download_config) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1078, in _download_and_prepare max_examples_per_split=download_config.max_examples_per_split, File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 931, in _download_and_prepare self._prepare_split(split_generator, prepare_split_kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1106, in _prepare_split shard_lengths, total_size = writer.finalize() File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 211, in finalize self._shuffler.bucket_lengths, self._path) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 88, in _get_shard_specs shard_boundaries = _get_shard_boundaries(num_examples, num_shards) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 107, in _get_shardboundaries raise AssertionError("No examples were yielded.") AssertionError: No examples were yielded.

Expected behavior Stanford dogs was already downloaded so no download bars there as expected. Running this with tfds.load("mnist") produces no errors.

Additional context A similar problem happened to a user using the PlantVillage and The300wLpTest datasets here.

Eshan-Agarwal commented 4 years ago

@Flowhill It's working fine please recheck I think there is some issue with your system refer this colab notebook.

Edit : Also check locally for Stanford_dogsrunning fine, also run fine for plant_village dataset colab Try to reinstall TFDSpip uninstall tensorflow_datasets

Flowhill commented 4 years ago

@Flowhill It's working fine please recheck I think there is some issue with your system refer this colab notebook.

Edit : Also check locally for Stanford_dogsrunning fine, also run fine for plant_village dataset colab Try to reinstall TFDSpip uninstall tensorflow_datasets

It does indeed seem to work via colab. Reinstalled tfds pip uninstall tensorflow_datasets pip install tensorflow_datasets and tried plant_village. It throws the same error. This is done on a clean conda environment having installed tensorflow, tensorflow-gpu and pip installing tensorflow_datasets (as the anaconda cloud has an outdated version of it)

Dl Completed...: 100%|████████████████████████████████████████████████████████████████| 1/1 [05:19<00:00, 319.00s/ url] Shuffling and writing examples to C:\Users\Flowhill\tensorflow_datasets\plant_village\1.0.0.incomplete16HE9U\plant_village-train.tfrecord Traceback (most recent call last): File "", line 1, in File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec return fn(args, kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\registered.py", line 305, in load dbuilder.download_and_prepare(download_and_prepare_kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec return fn(args, kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 340, in download_and_prepare download_config=download_config) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1078, in _download_and_prepare max_examples_per_split=download_config.max_examples_per_split, File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 931, in _download_and_prepare self._prepare_split(split_generator, prepare_split_kwargs) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1106, in _prepare_split shard_lengths, total_size = writer.finalize() File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 211, in finalize self._shuffler.bucket_lengths, self._path) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 88, in _get_shard_specs shard_boundaries = _get_shard_boundaries(num_examples, num_shards) File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 107, in _get_shard_boundaries raise AssertionError("No examples were yielded.") AssertionError: No examples were yielded.

Edit: I did notice that the colab uses python version 3.6.9 and I use 3.6.10, might that cause issues?

Edit2: Tried both Python 3.7.7 as wel as 3.6.9, no difference.

Eshan-Agarwal commented 4 years ago

I am not sure it helps but I think problem is with tf.io.gfile.glob() here can you please replace it with glob.iglob() afterimport glob and try again

Flowhill commented 4 years ago

I am not sure it helps but I think problem is with tf.io.gfile.glob() here can you please replace it with glob.iglob() afterimport glob and try again

Could you clarify this a bit? I'm not exactly sure what you mean.

Eshan-Agarwal commented 4 years ago

There is a problem with tf.io.gfile.glob() it's not matching some patterns but problem was solved by TF team, Alternative solution to this is using glob() as I defined above but we can't use it because we want support for GCS but in your case you can try it so we can catch where you got error.

Are you able to extract the data ? please check and let me know, you can find extracted data here C:\Users\eshan\tensorflow_datasets\downloads\extracted\

Flowhill commented 4 years ago

There is a problem with tf.io.gfile.glob() it's not matching some patterns but problem was solved by TF team, Alternative solution to this is using glob() as I defined above but we can't use it because we want support for GCS but in your case you can try it so we can catch where you got error.

Are you able to extract the data ? please check and let me know, you can find extracted data here C:\Users\eshan\tensorflow_datasets\downloads\extracted\

I am able to extract the data:

Directory of C:\Users\Flowhill\tensorflow_datasets\downloads\extracted

01/04/2020 14:29

. 01/04/2020 14:29 .. 01/04/2020 14:28 TAR.vision.stanfor.edu_aditya8_ImageNe_annotatZi7Tb3B75rGVwv2jktjhU-g68oztyOITSmSfGY3vMP0.tar 01/04/2020 14:28 TAR.vision.stanfor.edu_aditya8_ImageNe_listsNLR8rNmpi10VDghPJNKTkcCExVJyKV7GXIVlG8NfTWw.tar 0 File(s) 0 bytes 4 Dir(s) 11.172.319.232 bytes free

I'll try your glob fix now.

Eshan-Agarwal commented 4 years ago

So is it working ?

Flowhill commented 4 years ago

Ok so I found the file in ~\anaconda3\envs\tf-gpu\Lib\site-packages\tensorflow_datasets\image called plant_village.py and replaced the line

for fpath in tf.io.gfile.glob(glob_path): to for fpath in tf.io.gfile.iglob(glob_path):

This is what you wanted me to do right? If not could you repeat what you want me to do? I did not see a line that said import glob.

Eshan-Agarwal commented 4 years ago

Just do this import glob at top, then replace tf.io.gfile.glob with glob.iglob()

Flowhill commented 4 years ago

Just do this import glob at top, then replace tf.io.gfile.glob with glob.iglob()

Yes it works! Thank you very much, you are a blessing!

Let me summarize: My problem was using a fresh anaconda environment created by: conda create --name <name> tensorflow tensorflow-gpu and pip install tensorflow-datasets

Error: raise AssertionError("No examples were yielded.") AssertionError: No examples were yielded.

Occurred with the following code: import tensorflow_datasets as tfds tfds.load("stanford_dogs") or tfds.load("plant_village")

The solution for plant_village is to navigate to ~\anaconda3\envs\name\Lib\site-packages\tensorflow_datasets\image\plant_village.py

add import glob at the top and replace line for fpath in tf.io.gfile.glob(glob_path): by for fpath in glob.iglob(glob_path):

The solution for stanford_dogs is to navigate to ~\anaconda3\envs\name\Lib\site-packages\tensorflow_datasets\image\stanford_dogs.py

The exact lines I replaced are as follows: replace _NAME_RE = re.compile(r"([\w-]*/)*([\w]*.jpg)$") with _NAME_RE = re.compile(r"([\w-]*\\)*([\w]*.jpg)$") replace if not res or (fname.split("/")[-1] not in file_names): with if not res or (fname.split("\\")[-1] not in file_names):

Note that it could be the case that the following might need to be replaced later on and it simply hasn't thrown an error yet in def parse_mat_file(file_name):: element.split("/")[-1] for element in parsed_mat_arr["file_list"] by element.split("\\")[-1] for element in parsed_mat_arr["file_list"]

What did not need to be replaced is the following in def parse_mat_file(file_name):: element.split("/")[-2].lower() # Extract path/label/img.jpg Replacing that throws an error.

Eshan-Agarwal commented 4 years ago

@Flowhill Thanks you for showing results, actually its not good to use glob.iglobwe have to use tf.io.gfile because to provide GCS support. Are you tried to reinstall TF ? because they fix this issue, But pip package is not updated yet so you can simply use pip install tensorflow==2.2.0rc2 and not to change tf.io.gfile But its fine if you able to work with glob.iglob() as in future updated version we cannot have this error in TFDS

For Stanford_dogs error is in _generate_examples so there is nothing wrong with tf.io.gfile but I think problem is with _NAME_RE.match thats why its gives AssertionError: No examples were yielded maybe it not matches pattern given correctly.

Flowhill commented 4 years ago

@Flowhill Thanks you for showing results, actually its not good to use glob.iglobwe have to use tf.io.gfile because to provide GCS support. Are you tried to reinstall TF ? because they fix this issue, But pip package is not updated yet so you can simply use pip install tensorflow==2.2.0rc2 and not to change tf.io.gfile But its fine if you able to work with glob.iglob() as in future updated version we cannot have this error in TFDS

For Stanford_dogs error is in _generate_examples so there is nothing wrong with tf.io.gfile but I think problem is with _NAME_RE.match thats why its gives AssertionError: No examples were yielded maybe it not matches pattern given correctly.

Updating tensorflow using pip install tensorflow==2.2.0rc2 solves the tf.io.gfile problem of the PlantVillage dataset. The original stanford_dogs problem is still there. I've tried to check whether somethign goes wrong with _NAME_RE.match(fname) by adding a print statement printing both _NAME_RE and fname, but it seems to output these properly. I'm going to debug a bit more to find out when it throws the error.

EDIT: It does seem to be a problem with match. Adding a little counter for when the names were and weren't matched yielded the following result before throwing the error: count true = 0 count false = 20580 _NAME_RE outputs: re.compile('([\\w-]*/)*([\\w]*.jpg)$') fname outputs seemingly correct pahsh such as : Images\n02108915-French_bulldog\n02108915_9899.jpg, Images\n02108915-French_bulldog\n02108915_971.jpg, or Images\n02113978-Mexican_hairless\n02113978_124.jpg

Eshan-Agarwal commented 4 years ago

Yes you are right and reason is windows using backslash for paths while other using forward slash so replacing _NAME_RE = re.compile(r"([\w-]*/)*([\w]*.jpg)$") with _NAME_RE = re.compile(r"([\w-]*\\)*([\w]*.jpg)$") and if not res or (fname.split("/")[-1] not in file_names): with if not res or (fname.split("\\")[-1] not in file_names): runs fine for windows but its not good solution

Flowhill commented 4 years ago

I got it working!

The exact lines I replaced are as follows: replace _NAME_RE = re.compile(r"([\w-]*/)*([\w]*.jpg)$") with _NAME_RE = re.compile(r"([\w-]*\\)*([\w]*.jpg)$") replace if not res or (fname.split("/")[-1] not in file_names): with if not res or (fname.split("\\")[-1] not in file_names):

Note that it could be the case that the following might need to be replaced later on and it simply hasn't thrown an error yet in def parse_mat_file(file_name):: element.split("/")[-1] for element in parsed_mat_arr["file_list"] by element.split("\\")[-1] for element in parsed_mat_arr["file_list"]

What did not need to be replaced is the following in def parse_mat_file(file_name):: element.split("/")[-2].lower() # Extract path/label/img.jpg Replacing that throws an error.

My final setup is as follows: Environment information

The dataset not working with windows is really something that should be fixed.

Eshan-Agarwal commented 4 years ago

@Flowhill Please use these changes in above PR, if it works for you, it works for me

Edit: I tried it in Windows, Linux and colab

Flowhill commented 4 years ago

@Flowhill Please use these changes in above PR, if it works for you, it works for me

Edit: I tried it in Windows, Linux and colab

Done!

ketan-lambat commented 3 years ago

Getting AssertionError: No examples were yielded. for custom tfds.

My output for tfds build my_dataset.py

tfds.core.DatasetInfo(
    name='my_dataset',
    full_name='my_dataset/1.0.0',
    description="""

    """,
    homepage='https://www.tensorflow.org/datasets/catalog/my_dataset',
    data_path='/root/tensorflow_datasets/my_dataset/1.0.0',
    download_size=Unknown size,
    dataset_size=2.71 MiB,
    features=FeaturesDict({
        'image': Image(shape=(None, None, 3), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    }),
    supervised_keys=('image', 'label'),
    splits={
        'testA': <SplitInfo num_examples=24, num_shards=1>,
        'testB': <SplitInfo num_examples=24, num_shards=1>,
        'trainA': <SplitInfo num_examples=24, num_shards=1>,
        'trainB': <SplitInfo num_examples=24, num_shards=1>,
    },
    citation="""""",
)

This is what I had in colab !pip install -q tfds-nightly

import tensorflow_datasets as tfds
import my_dataset

ds = tfds.load('my_dataset')

And then this output

Downloading and preparing dataset my_dataset/1.0.0 (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/my_dataset/1.0.0...
Generating splits...: 0%
0/4 [00:00<?, ? splits/s]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-0f858bbd233c> in <module>()
----> 1 ds = tfds.load('my_dataset')

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/tfrecords_writer.py in _get_shard_boundaries(num_examples, number_of_shards)
    116 ) -> List[int]:
    117   if num_examples == 0:
--> 118     raise AssertionError("No examples were yielded.")
    119   if num_examples < number_of_shards:
    120     raise AssertionError("num_examples ({}) < number_of_shards ({})".format(

AssertionError: No examples were yielded.

Please help on how to solve this.