keras-team / keras-preprocessing

Utilities for working with image data, text data, and sequence data.
Other
1.02k stars 444 forks source link

TypeError: endswith first arg must be bytes or a tuple of bytes, not str #307

Closed alexdauenhauer closed 3 years ago

alexdauenhauer commented 3 years ago

I am following the guidelines from the tensorflow documentation as well as I can tell for creating a tf.data.Dataset from a generator.

train_dir = os.path.join(base_dir, 'train')
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

test_datagen = ImageDataGenerator(rescale=1 / 255)

train_dataset = tf.data.Dataset.from_generator(
    train_datagen.flow_from_directory, args=[train_dir],
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, 256, 256, 3], [None, 1]))

validation_dataset = tf.data.Dataset.from_generator(
    test_datagen.flow_from_directory, args=[validation_dir],
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, 256, 256, 3], [None, 1]))

train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

train_dataset = tf.data.Dataset.from_generator(
    train_datagen.flow_from_directory, args=[train_dir],
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, 256, 256, 3], [None, 1]))

validation_dataset = tf.data.Dataset.from_generator(
    test_datagen.flow_from_directory, args=[validation_dir],
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, 256, 256, 3], [None, 1]))

# defining the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        16, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1
    # class ('cats') and 1 for the other ('dogs')
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.summary()

model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

history = model.fit(
    train_dataset,
    steps_per_epoch=62,  # 2000 images = batch_size * steps
    epochs=100,
    validation_data=validation_dataset,
    validation_steps=31,  # 1000 images = batch_size * steps
    use_multiprocessing=True,
    workers=os.cpu_count()
)

When I call model.fit I get this error

InvalidArgumentError:  TypeError: endswith first arg must be bytes or a tuple of bytes, not str
Traceback (most recent call last):

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 669, in get_iterator
    return self._iterators[iterator_id]

KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 243, in __call__
    ret = func(*args)

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 309, in wrapper
    return func(*args, **kwargs)

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 785, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 671, in get_iterator
    iterator = iter(self._generator(*self._args.pop(iterator_id)))

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/keras_preprocessing/image/image_data_generator.py", line 540, in flow_from_directory
    interpolation=interpolation

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/keras_preprocessing/image/directory_iterator.py", line 126, in __init__
    classes, filenames = res.get()

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/keras_preprocessing/image/utils.py", line 216, in _list_valid_filenames_in_directory
    for root, fname in valid_files:

  File "/usr/lfs/v0/anaconda3/envs/learning/lib/python3.7/site-packages/keras_preprocessing/image/utils.py", line 172, in _iter_valid_files
    if fname.lower().endswith('.tiff'):

TypeError: endswith first arg must be bytes or a tuple of bytes, not str

     [[{{node PyFunc}}]]
     [[IteratorGetNext]] [Op:__inference_train_function_1695]

Function call stack:
train_function

Looking online and many people are saying upgrading to TF 2.1 fixed this, but I am running 2.2.

Current Environment

keras-preprocessing       1.1.0
python                    3.7.9
tensorflow                2.2.0
tensorflow-base           2.2.0
tensorflow-estimator      2.2.0
tensorflow-gpu            2.2.0

It seems that the filename is being interpreted as bytes instead of a string so then endswith is expecting bytes?

alexdauenhauer commented 3 years ago
train_dataset = tf.data.Dataset.from_generator(
    lambda: train_datagen.flow_from_directory(train_dir),
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, 256, 256, 3], [None, 2]))

switching to a lambda function solved it, but seems like I shouldn't have to do that