Closed Flowhill closed 4 years ago
@Flowhill It's working fine please recheck I think there is some issue with your system refer this colab notebook.
Edit : Also check locally for
Stanford_dogs
running fine, also run fine for plant_village dataset colab Try to reinstall TFDSpip uninstall tensorflow_datasets
It does indeed seem to work via colab.
Reinstalled tfds pip uninstall tensorflow_datasets pip install tensorflow_datasets
and tried plant_village. It throws the same error.
This is done on a clean conda environment having installed tensorflow, tensorflow-gpu and pip installing tensorflow_datasets (as the anaconda cloud has an outdated version of it)
Dl Completed...: 100%|████████████████████████████████████████████████████████████████| 1/1 [05:19<00:00, 319.00s/ url]
Shuffling and writing examples to C:\Users\Flowhill\tensorflow_datasets\plant_village\1.0.0.incomplete16HE9U\plant_village-train.tfrecord
Traceback (most recent call last):
File "
Edit: I did notice that the colab uses python version 3.6.9 and I use 3.6.10, might that cause issues?
Edit2: Tried both Python 3.7.7 as wel as 3.6.9, no difference.
I am not sure it helps but I think problem is with tf.io.gfile.glob()
here can you please replace it with glob.iglob()
afterimport glob
and try again
I am not sure it helps but I think problem is with
tf.io.gfile.glob()
here can you please replace it withglob.iglob()
afterimport glob
and try again
Could you clarify this a bit? I'm not exactly sure what you mean.
There is a problem with tf.io.gfile.glob()
it's not matching some patterns but problem was solved by TF team, Alternative solution to this is using glob()
as I defined above but we can't use it because we want support for GCS but in your case you can try it so we can catch where you got error.
Are you able to extract the data ? please check and let me know, you can find extracted data here C:\Users\eshan\tensorflow_datasets\downloads\extracted\
There is a problem with
tf.io.gfile.glob()
it's not matching some patterns but problem was solved by TF team, Alternative solution to this is usingglob()
as I defined above but we can't use it because we want support for GCS but in your case you can try it so we can catch where you got error.Are you able to extract the data ? please check and let me know, you can find extracted data here
C:\Users\eshan\tensorflow_datasets\downloads\extracted\
I am able to extract the data:
Directory of C:\Users\Flowhill\tensorflow_datasets\downloads\extracted
01/04/2020 14:29
I'll try your glob fix now.
So is it working ?
Ok so I found the file in ~\anaconda3\envs\tf-gpu\Lib\site-packages\tensorflow_datasets\image called plant_village.py and replaced the line
for fpath in tf.io.gfile.glob(glob_path):
to
for fpath in tf.io.gfile.iglob(glob_path):
This is what you wanted me to do right? If not could you repeat what you want me to do? I did not see a line that said import glob.
Just do this import glob
at top, then replace tf.io.gfile.glob
with glob.iglob()
Just do this
import glob
at top, then replacetf.io.gfile.glob
withglob.iglob()
Yes it works! Thank you very much, you are a blessing!
Let me summarize:
My problem was using a fresh anaconda environment created by:
conda create --name <name> tensorflow tensorflow-gpu
and
pip install tensorflow-datasets
Error: raise AssertionError("No examples were yielded.") AssertionError: No examples were yielded.
Occurred with the following code:
import tensorflow_datasets as tfds
tfds.load("stanford_dogs")
or
tfds.load("plant_village")
The solution for plant_village is to navigate to ~\anaconda3\envs\name\Lib\site-packages\tensorflow_datasets\image\plant_village.py
add import glob
at the top and replace line
for fpath in tf.io.gfile.glob(glob_path):
by
for fpath in glob.iglob(glob_path):
The solution for stanford_dogs is to navigate to ~\anaconda3\envs\name\Lib\site-packages\tensorflow_datasets\image\stanford_dogs.py
The exact lines I replaced are as follows:
replace _NAME_RE = re.compile(r"([\w-]*/)*([\w]*.jpg)$")
with _NAME_RE = re.compile(r"([\w-]*\\)*([\w]*.jpg)$")
replace if not res or (fname.split("/")[-1] not in file_names):
with if not res or (fname.split("\\")[-1] not in file_names):
Note that it could be the case that the following might need to be replaced later on and it simply hasn't thrown an error yet in def parse_mat_file(file_name):
:
element.split("/")[-1] for element in parsed_mat_arr["file_list"]
by element.split("\\")[-1] for element in parsed_mat_arr["file_list"]
What did not need to be replaced is the following in def parse_mat_file(file_name):
:
element.split("/")[-2].lower() # Extract path/label/img.jpg
Replacing that throws an error.
@Flowhill Thanks you for showing results, actually its not good to use glob.iglob
we have to use tf.io.gfile
because to provide GCS support.
Are you tried to reinstall TF ? because they fix this issue, But pip package is not updated yet so you can simply use pip install tensorflow==2.2.0rc2
and not to change tf.io.gfile
But its fine if you able to work with glob.iglob()
as in future updated version we cannot have this error in TFDS
For Stanford_dogs
error is in _generate_examples
so there is nothing wrong with tf.io.gfile
but I think problem is with _NAME_RE.match
thats why its gives AssertionError: No examples were yielded
maybe it not matches pattern given correctly.
@Flowhill Thanks you for showing results, actually its not good to use
glob.iglob
we have to usetf.io.gfile
because to provide GCS support. Are you tried to reinstall TF ? because they fix this issue, But pip package is not updated yet so you can simply usepip install tensorflow==2.2.0rc2
and not to changetf.io.gfile
But its fine if you able to work withglob.iglob()
as in future updated version we cannot have this error in TFDSFor
Stanford_dogs
error is in_generate_examples
so there is nothing wrong withtf.io.gfile
but I think problem is with_NAME_RE.match
thats why its givesAssertionError: No examples were yielded
maybe it not matches pattern given correctly.
Updating tensorflow using pip install tensorflow==2.2.0rc2
solves the tf.io.gfile
problem of the PlantVillage dataset.
The original stanford_dogs problem is still there.
I've tried to check whether somethign goes wrong with _NAME_RE.match(fname)
by adding a print statement printing both _NAME_RE
and fname
, but it seems to output these properly.
I'm going to debug a bit more to find out when it throws the error.
EDIT: It does seem to be a problem with match. Adding a little counter for when the names were and weren't matched yielded the following result before throwing the error:
count true = 0
count false = 20580
_NAME_RE outputs: re.compile('([\\w-]*/)*([\\w]*.jpg)$')
fname outputs seemingly correct pahsh such as : Images\n02108915-French_bulldog\n02108915_9899.jpg, Images\n02108915-French_bulldog\n02108915_971.jpg, or Images\n02113978-Mexican_hairless\n02113978_124.jpg
Yes you are right and reason is windows using backslash for paths while other using forward slash so replacing _NAME_RE = re.compile(r"([\w-]*/)*([\w]*.jpg)$")
with _NAME_RE = re.compile(r"([\w-]*\\)*([\w]*.jpg)$")
and if not res or (fname.split("/")[-1] not in file_names):
with if not res or (fname.split("\\")[-1] not in file_names):
runs fine for windows but its not good solution
I got it working!
The exact lines I replaced are as follows:
replace _NAME_RE = re.compile(r"([\w-]*/)*([\w]*.jpg)$")
with _NAME_RE = re.compile(r"([\w-]*\\)*([\w]*.jpg)$")
replace if not res or (fname.split("/")[-1] not in file_names):
with if not res or (fname.split("\\")[-1] not in file_names):
Note that it could be the case that the following might need to be replaced later on and it simply hasn't thrown an error yet in def parse_mat_file(file_name):
:
element.split("/")[-1] for element in parsed_mat_arr["file_list"]
by element.split("\\")[-1] for element in parsed_mat_arr["file_list"]
What did not need to be replaced is the following in def parse_mat_file(file_name):
:
element.split("/")[-2].lower() # Extract path/label/img.jpg
Replacing that throws an error.
My final setup is as follows: Environment information
tensorflow-datasets
/tfds-nightly
version: tensorflow-datasets=2.1.0tensorflow
/tensorflow-gpu
/tf-nightly
/tf-nightly-gpu
version:
Via 'pip freeze | findstr tensorflow'
tensorflow==2.2.0rc2
tensorflow-datasets==2.1.0
tensorflow-estimator==2.2.0rc0
tensorflow-metadata==0.21.1The dataset not working with windows is really something that should be fixed.
@Flowhill Please use these changes in above PR, if it works for you, it works for me
Edit: I tried it in Windows, Linux and colab
@Flowhill Please use these changes in above PR, if it works for you, it works for me
Edit: I tried it in Windows, Linux and colab
Done!
Getting AssertionError: No examples were yielded.
for custom tfds.
My output for tfds build my_dataset.py
tfds.core.DatasetInfo(
name='my_dataset',
full_name='my_dataset/1.0.0',
description="""
""",
homepage='https://www.tensorflow.org/datasets/catalog/my_dataset',
data_path='/root/tensorflow_datasets/my_dataset/1.0.0',
download_size=Unknown size,
dataset_size=2.71 MiB,
features=FeaturesDict({
'image': Image(shape=(None, None, 3), dtype=tf.uint8),
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
}),
supervised_keys=('image', 'label'),
splits={
'testA': <SplitInfo num_examples=24, num_shards=1>,
'testB': <SplitInfo num_examples=24, num_shards=1>,
'trainA': <SplitInfo num_examples=24, num_shards=1>,
'trainB': <SplitInfo num_examples=24, num_shards=1>,
},
citation="""""",
)
This is what I had in colab
!pip install -q tfds-nightly
import tensorflow_datasets as tfds
import my_dataset
ds = tfds.load('my_dataset')
And then this output
Downloading and preparing dataset my_dataset/1.0.0 (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/my_dataset/1.0.0...
Generating splits...: 0%
0/4 [00:00<?, ? splits/s]
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-16-0f858bbd233c> in <module>()
----> 1 ds = tfds.load('my_dataset')
8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/tfrecords_writer.py in _get_shard_boundaries(num_examples, number_of_shards)
116 ) -> List[int]:
117 if num_examples == 0:
--> 118 raise AssertionError("No examples were yielded.")
119 if num_examples < number_of_shards:
120 raise AssertionError("num_examples ({}) < number_of_shards ({})".format(
AssertionError: No examples were yielded.
Please help on how to solve this.
Short description When performing the following snippet of code:
tfds.load("stanford_dogs")
the error: AssertionError("No examples were yielded.") is thrown
Environment information
tensorflow-datasets
/tfds-nightly
version: tensorflow-datasets=2.1.0tensorflow
/tensorflow-gpu
/tf-nightly
/tf-nightly-gpu
version: Via 'pip freeze | findstr tensorflow' tensorflow==2.1.0 tensorflow-datasets==2.1.0 tensorflow-estimator==2.1.0 tensorflow-metadata==0.21.1Reproduction instructions
Link to logs _[1mDownloading and preparing dataset stanford_dogs/0.2.0 (download: 778.12 MiB, generated: Unknown size, total: 778.12 MiB) to C:\Users\Flowhill\tensorflow_datasets\stanford_dogs\0.2.0...[0m Dl Completed...: 0 url [00:00, ? url/s] Dl Size...: 0 MiB [00:00, ? MiB/s] Dl Size...: 0 MiB [00:00, ? MiB/s]
Dl Completed...: 0 url [00:00, ? url/s] Dl Completed...: 0 url [00:00, ? url/s] Dl Size...: 0 MiB [00:00, ? MiB/s]
Extraction completed...: 0 file [00:00, ? file/s] Extraction completed...: 0 file [00:00, ? file/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Dl Completed...: 0 url [00:00, ? url/s] Shuffling and writing examples to C:\Users\Flowhill\tensorflow_datasets\stanford_dogs\0.2.0.incomplete37A45O\stanford_dogs-train.tfrecord Traceback (most recent call last): File "", line 1, in
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec
return fn(args, kwargs)
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\registered.py", line 305, in load
dbuilder.download_and_prepare(download_and_prepare_kwargs)
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec
return fn(args, kwargs)
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 340, in download_and_prepare
download_config=download_config)
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1078, in _download_and_prepare
max_examples_per_split=download_config.max_examples_per_split,
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 931, in _download_and_prepare
self._prepare_split(split_generator, prepare_split_kwargs)
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1106, in _prepare_split
shard_lengths, total_size = writer.finalize()
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 211, in finalize
self._shuffler.bucket_lengths, self._path)
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 88, in _get_shard_specs
shard_boundaries = _get_shard_boundaries(num_examples, num_shards)
File "C:\Users\Flowhill\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_datasets\core\tfrecords_writer.py", line 107, in _get_shardboundaries
raise AssertionError("No examples were yielded.")
AssertionError: No examples were yielded.
Expected behavior Stanford dogs was already downloaded so no download bars there as expected. Running this with tfds.load("mnist") produces no errors.
Additional context A similar problem happened to a user using the PlantVillage and The300wLpTest datasets here.