keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.91k stars 19.45k forks source link

RandomTranslation applied after RandomRotation causes a TypeError #18642

Closed NuraliMedeu closed 1 year ago

NuraliMedeu commented 1 year ago

Code to reproduce the problem:

images_train, images_val = keras.utils.image_dataset_from_directory(
    directory=train_dir,
    batch_size=BATCH_SIZE,
    image_size=(IMAGE_DIM, IMAGE_DIM),
    shuffle=True,
    seed=1,
    validation_split=0.2,
    subset="both"
)

# Rotation factor = 40 deg / 360 deg = x rad / 2pi rad ≈ 0.11
rotate = keras.layers.RandomRotation(factor=0.11, fill_mode="nearest")
shift = keras.layers.RandomTranslation(
    width_factor=0.2,
    height_factor=0.2,
    fill_mode="nearest"
)
shear = keras_cv.layers.RandomShear(
    # Shear factor = cotangent(90 deg - 0.2 deg) = cot(89.8 deg) ≈ 0.0035
    x_factor=0.0035,
    fill_mode="nearest"
)
zoom = keras.layers.RandomZoom(
    height_factor=0.2,
    fill_mode="nearest"
)
flip = keras.layers.RandomFlip(mode="horizontal")
brighten = keras.layers.RandomBrightness(factor=0.2)
normalize = keras.layers.Rescaling(1.0/255)

# TODO This ugly stacking of the layer functions inside the lambda expression is
# necessary because the Keras CV shearing layer cannot be directly added to a Keras
# Sequential model. The order of the transformations is the same as in the
# source code of the deprecated ImageDataGenerator.
images_train = images_train.map(
    lambda image, label: (normalize(brighten(flip(zoom(shear(
        shift(
            rotate(
                image
            )
        )
    ))))), label),
    num_parallel_calls=AUTOTUNE
)

The Python interpreter complains about the line height_translate = height_translate * height in random_translation.py.

It outputs:

TypeError: Exception encountered when calling RandomTranslation.call().

    Input 'y' of 'Mul' Op has type int32 that does not match type float32 of argument 'x'.

    Arguments received by RandomTranslation.call():
      • inputs=tf.Tensor(shape=(None, None, None, 3), dtype=float32)
      • training=True

After analysing random_translation.py, I concluded that the x here is height_translate, which is the factor by which the input image will be vertically shifted. Thus, it has to be of a float type. Therefore, the y is height, which is the vertical dimension of the input image. It also cannot be anything other than an int.

If so, then why does the Mul Op, represented here by *, expect the operands to be of the same type?

I tried casting the image pixel values to float32 and int32, but that didn't change anything. I also tried looking into random_rotation.py, but didn't find anything unusual there.

Interestingly, if I reverse the order of rotate and shift, this issue disappears, although the operand types essentially don't change.

What is the true cause of this problem?

fchollet commented 1 year ago

Thanks for the report. Do you have a standalone, runnable Colab or code snippet to reproduce the problem?

NuraliMedeu commented 1 year ago

Sorry for the delay. Yes, here is the standalone code snippet:

!pip install keras-cv tensorflow --upgrade

import keras_core as keras
import keras_cv
import logging
import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf

logger = tf.get_logger()
logger.setLevel(logging.ERROR)

BATCH_SIZE = 100
IMAGE_DIM = 150
AUTOTUNE = tf.data.AUTOTUNE

zip_dir = keras.utils.get_file(
    "cats_and_dogs_filterted.zip",
    origin="https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip",
    extract=True
)
base_dir = os.path.join(os.path.dirname(zip_dir), "cats_and_dogs_filtered")
train_dir = os.path.join(base_dir, "train")
validation_dir = os.path.join(base_dir, "validation")

# Plot images in a grid of 1 row and 5 columns
def plot_images(images_arr):
    fig, axes = plt.subplots(1, 5, figsize=(20, 20))
    axes = axes.flatten()
    for img, ax in zip(images_arr, axes):
        ax.imshow(img)
    plt.tight_layout()
    plt.show()

images_train = keras.utils.image_dataset_from_directory(
    directory=train_dir,
    batch_size=BATCH_SIZE,
    image_size=(IMAGE_DIM, IMAGE_DIM),
    shuffle=True
)

images_val = keras.utils.image_dataset_from_directory(
    directory=validation_dir,
    batch_size=BATCH_SIZE,
    image_size=(IMAGE_DIM, IMAGE_DIM),
    shuffle=False
)

# Rotation factor = 40 deg / 360 deg = x rad / 2pi rad ≈ 0.11
rotate = keras.layers.RandomRotation(factor=0.11, fill_mode="nearest")
shift = keras.layers.RandomTranslation(
    width_factor=0.2,
    height_factor=0.2,
    fill_mode="nearest"
)
shear = keras_cv.layers.RandomShear(
    # Shear factor = cotangent(90 deg - 0.2 deg) = cot(89.8 deg) ≈ 0.0035
    x_factor=0.0035,
    fill_mode="nearest"
)
zoom = keras.layers.RandomZoom(
    height_factor=0.2,
    fill_mode="nearest"
)
flip = keras.layers.RandomFlip(mode="horizontal")
brighten = keras.layers.RandomBrightness(factor=0.2)
normalize = keras.layers.Rescaling(1.0/255)

# This ugly stacking of the layer functions inside the lambda expression is
# necessary because the Keras CV shearing layer cannot be directly added to
# a Keras Sequential model. The order of the transformations is the same as
# in the source code of the deprecated ImageDataGenerator.
images_train = images_train.map(
    lambda image, label: (normalize(brighten(flip(zoom(shear(
        # TODO Fix the TypeError here
        shift(
            rotate(
                image
            )
        )
    ))))), label),
    num_parallel_calls=AUTOTUNE
)
images_train = images_train.cache().prefetch(buffer_size=AUTOTUNE)
images_val = images_train.cache().prefetch(buffer_size=AUTOTUNE)

images_augmented_sample, _ = next(iter(images_train))
plot_images(images_augmented_sample)
NuraliMedeu commented 1 year ago

I have found the problem with my code. Apparently I need to use tf.keras instead of keras directly. As a result, I also don't need to import keras_core as keras. Nonetheless, I am not closing this issue for now because I would still like to know why this solution works.

sachinprasadhs commented 1 year ago

Here is the working Gist with legacy tf.kerashttps://gist.github.com/sachinprasadhs/f6a33d6455dc8cf3a0a85831a655e4ad.

With the Keras 3 (Keras Core) it is failing with the same error as you have reported.

fchollet commented 1 year ago

This is now fixed at HEAD. The fix here was to use the Keras op multiply for the * (otherwise it defaults to tensor.__mul__ from TF). The Keras op can handled mismatched dtypes, unlike the TF one.