keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
64 stars 31 forks source link

sparse_categorical_crossentropy with ignore_class=-1 makes loss to `nan` #734

Open innat opened 9 months ago

innat commented 9 months ago

This behaviour happens in Keras 2 but works in Keras 3.


I tried to train a multi-output model. But it target looks like something as follows

y1_dummy = [1,  2,   0, -1,  0,  -1,  -1, -1,  3,  -1]
y2_dummy = [-1, -1, -1,  2,  -1,  0,   3,  1, -1,   2]

Between this two target array, -1 is paced to y2_dummy[0] but some value in y1_dummy[0] and continues. In training time, I set ignore_class = -1, please see below.

def custom_loss(y_true, y_pred):
    loss = sparse_categorical_crossentropy(
        y_true, y_pred, ignore_class=-1
    )
    return loss

The code works in Keras 3 but in Keras 2, the loss becomes nan. Below is the full code.

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt

num_samples = 10
num_classes = 4
input_shape = (224, 224, 3) 
x_dummy = np.random.rand(num_samples, *input_shape).astype('float32')
y1_dummy = [1,  2,   0, -1,  0,  -1,  -1, -1,  3,  -1]
y2_dummy = [-1, -1, -1,  2,  -1,  0,   3,  1, -1,   2]
_sample = tf.data.Dataset.from_tensor_slices(x_dummy)
_labels = tf.data.Dataset.from_tensor_slices(
    (
        y1_dummy, 
        y2_dummy
    )
)
_data = tf.data.Dataset.zip((_sample, _labels))
_data = _data.batch(batch_size=3, drop_remainder=True)

def custom_loss(y_true, y_pred):
    loss = sparse_categorical_crossentropy(
        y_true, y_pred, ignore_class=-1
    )
    return loss

input_layer = keras.Input(shape=input_shape)
flatten_layer = layers.Flatten()(input_layer)
output_layer1 = layers.Dense(
    num_classes, activation='softmax', name='out1'
)(flatten_layer)
output_layer2 = layers.Dense(
    num_classes, activation='softmax', name='out2'
)(flatten_layer)
A = keras.Model(
    inputs=input_layer, 
    outputs=[output_layer1, output_layer2]
)
A.compile(
    optimizer=Adam(), 
    loss={
    'out1': custom_loss, 
    'out2': custom_loss
    }
)
A.fit(
    _data, 
    epochs=2,
)
tilakrayal commented 9 months ago

@innat, Thank you for reporting the issue. Could you please try to provide the PR for making these changes happen in the tf-keras as this issue is present only in keras2? Thank you!

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

innat commented 9 months ago

Hi, Im not open for the contribution. I found this from this https://stackoverflow.com/questions/77930212/multitask-learning-to-classify-on-dog-images/77935240#77935240