keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.04k stars 19.48k forks source link

Model fails to train with Linux and Keras 3.3.2 #19623

Closed jonbry closed 6 months ago

jonbry commented 6 months ago

The following code from Deep Learning with Python, Second Edition fails to train when using Keras 3.3.2 and TensorFlow 2.16.1 on a Linux machine (Ubuntu 20.04):

import keras
from keras import layers

import pathlib
from keras.utils import image_dataset_from_directory

new_base_dir = pathlib.Path("cats_vs_dogs_small")

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)

x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)

for size in [32, 64, 128, 256, 512]:
    residual = x

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)
    x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False)(x)

    x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

    residual = layers.Conv2D(
        size, 1, strides=2, padding="same", use_bias=False)(residual)
    x = layers.add([x, residual])

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

history = model.fit(
    train_dataset,
    epochs=100,
    validation_data=validation_dataset)

The accuracy over 100 epochs hovers around 50%: mini_xception_keras3_linux

The same results were reproduced with different linux machines, regardless whether it was run on the GPU or CPU, as well as using a JAX backend

What is strange about this issue is that trains successfully with the following configurations:

Any advice on what may be causing the issue? Let me know if there is any information that I can provide to help troubleshoot the issue.

Thank you!

fchollet commented 6 months ago

Any advice on what may be causing the issue? Let me know if there is any information that I can provide to help troubleshoot the issue.

This code is known to work, so it's likely a bad initialization. Some common steps you can take:

t-kalinowski commented 6 months ago

@fchollet I am able to reproduce this. I haven't had a chance to dig into the root cause yet, but I can confirm that this is a bug in Keras 3; the same code produces a model that trains just fine w/ TF 2.15 + Keras 2.

fchollet commented 6 months ago

Looking into it.

fchollet commented 6 months ago

I have fixed a related issue with dataset shuffling. Can you try installing v3.3.3 and checking if your code works with that version?

t-kalinowski commented 6 months ago

Thanks! Looks like it's fixed now. I can confirm the model trains fine with Keras v3.3.3 image

jonbry commented 6 months ago

Looks like v3.3.3 fixed the issue. Thanks for all of your help!

google-ml-butler[bot] commented 6 months ago

Are you satisfied with the resolution of your issue? Yes No

t-kalinowski commented 6 months ago

By the way, just noticed that github release tagged v3.3.3 has a typo in the title (Kears vs Keras): Kears 3.3.3

Maybe this is the reason v3.3.2 is still listed as the "latest release" on the repo landing page?

sachinprasadhs commented 6 months ago

@t-kalinowski , I just updated the latest release tag in the landing page