keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
58 stars 28 forks source link

image_dataset_from_directory uses wrong directory when labels is list #69

Open guberti opened 2 years ago

guberti commented 2 years ago

Describe the problem.

The docs for image_dataset_from_directory say the following about the directory argument:

Directory where the data is located. If labels is "inferred", it should contain subdirectories, 
each containing images for a class. Otherwise, the directory structure is ignored.

This means that when labels is a list/tuple, we should ignore the directory structure (this makes sense, as the directory structure would only be used to generate labels).

Describe the current behavior.

However, this is not what happens - instead, see the following code snippet from dataset_utils.py:

  if labels is None:
    # in the no-label case, index from the parent directory down.
    subdirs = ['']
    class_names = subdirs
  else:
    subdirs = []
    for subdir in sorted(tf.io.gfile.listdir(directory)):

We only ignore the subdirectory structure if labels is None, instead of when labels != 'inferred'. This means that when labels is a list/tuple, we expect a subdirectory structure (when none exists), causing image_dataset_from_directory to fail in this case.

Describe the expected behavior.

We should ignore the subdirectory structure if labels is anything other than inferred (i.e. make the code match what the documentation says should happen). This should be a one-line change, and I'd be happy to make a PR.

However, the existence of this issue suggests the use case where labels is a list/tuple is not unit tested, so it would probably be good to write a test. Would love a suggestion from someone more familiar with the codebase about how best to do this.

campellcl commented 1 year ago

Any update on this a year later? It's not clear how to proceed with unbalanced binary classification via TensorFlow datasets per the official tutorial, if Keras fails to understand there are no sub-directories when the labels are explicitly provided.