The loss becomes neagative from positive values dring taining loop

yijianSU22 commented 2 weeks ago

Hi, I just ran a unet model on a train set, and used the dice and crossentropy loss as a loss function,but t found that the loss value is not normal , it became negative geadually. As bellow: 2024-04-27 22:54:02.697477: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2617/2617 [==============================] - 9485s 4s/step - loss: 0.3995 - accuracy: 0.0302 - val_loss: 0.3482 - val_accuracy: 0.0182 Epoch 2/20 2617/2617 [==============================] - 9453s 4s/step - loss: 0.1805 - accuracy: 0.2205 - val_loss: 0.1516 - val_accuracy: 0.9400 Epoch 3/20 2617/2617 [==============================] - 9428s 4s/step - loss: 0.0435 - accuracy: 0.9362 - val_loss: 0.1033 - val_accuracy: 0.9482 Epoch 4/20 2617/2617 [==============================] - 9412s 4s/step - loss: -0.0293 - accuracy: 0.9398 - val_loss: 0.0141 - val_accuracy: 0.9459 Epoch 5/20 2617/2617 [==============================] - 9444s 4s/step - loss: -0.0844 - accuracy: 0.9420 - val_loss: -0.0150 - val_accuracy: 0.9548 Epoch 6/20 2617/2617 [==============================] - 9436s 4s/step - loss: -0.1212 - accuracy: 0.9440 - val_loss: -0.0363 - val_accuracy: 0.9599 Epoch 7/20 2617/2617 [==============================] - 9397s 4s/step - loss: -0.1537 - accuracy: 0.9457 - val_loss: -0.0193 - val_accuracy: 0.9538 Epoch 8/20 2617/2617 [==============================] - 9305s 4s/step - loss: -0.1777 - accuracy: 0.9467 - val_loss: -0.0149 - val_accuracy: 0.9526 Epoch 9/20 2617/2617 [==============================] - 8968s 3s/step - loss: -0.2004 - accuracy: 0.9473 - val_loss: -0.0841 - val_accuracy: 0.9576 Epoch 10/20 2617/2617 [==============================] - 8787s 3s/step - loss: -0.2210 - accuracy: 0.9480 - val_loss: -0.0822 - val_accuracy: 0.9571 Epoch 11/20 2617/2617 [==============================] - 8794s 3s/step - loss: -0.2337 - accuracy: 0.9486 - val_loss: -0.0837 - val_accuracy: 0.9566 Epoch 12/20 2617/2617 [==============================] - 8809s 3s/step - loss: -0.2521 - accuracy: 0.9492 - val_loss: -0.0856 - val_accuracy: 0.9615 Epoch 13/20 2617/2617 [==============================] - 8804s 3s/step - loss: -0.2688 - accuracy: 0.9500 - val_loss: -0.1012 - val_accuracy: 0.9594 Epoch 14/20 2617/2617 [==============================] - 8807s 3s/step - loss: -0.2867 - accuracy: 0.9508 - val_loss: -0.0994 - val_accuracy: 0.9599 Epoch 15/20 2617/2617 [==============================] - 8721s 3s/step - loss: -0.2949 - accuracy: 0.9511 - val_loss: -0.1008 - val_accuracy: 0.9605 Epoch 16/20 2617/2617 [==============================] - 8684s 3s/step - loss: -0.3071 - accuracy: 0.9515 - val_loss: -0.0705 - val_accuracy: 0.9564 Epoch 17/20 349/2617 [===>..........................] - ETA: 37:27 - loss: -0.0398 - accuracy: 0.9501

and this is my loss function: class categorical_dicePcrossentropy_weight(tf.keras.losses.Loss): def init(self,class_weight,lamda=0.5): super().init() self.lamda = lamda self.weight = class_weight

def call(self, y_true, y_pred):
    smooth = 1.e-5
    smooth = tf.constant(smooth,tf.float32)

    y_true = tf.cast(y_true,tf.float32)
    y_pred = tf.cast(y_pred,tf.float32)

    intersection = tf.math.reduce_sum(y_pred * y_true,axis=(1,2,3))
    union = tf.math.reduce_sum((y_pred+y_true),axis=(1,2,3))
    dice_coef = tf.math.reduce_sum(2 * (intersection + smooth) / (union + smooth),axis=0)

    loss1 = tf.math.reduce_mean(self.weight * dice_coef)

    epsilon = 1.e-5
    output = y_pred/tf.math.reduce_sum(y_pred,axis=-1,keepdims=True)
    output = tf.clip_by_value(output,epsilon,1-epsilon)

    loss = y_true * tf.math.log(output)

    loss = tf.math.reduce_mean(loss, axis=(1, 2, 3))
    loss = tf.math.reduce_mean(loss, axis=0)
    loss2 = tf.math.reduce_mean(self.weight * loss)

    total_loss = (1 - self.lamda) * (1 - loss1) + self.lamda * loss2

    return total_loss

I don't know why,Is there a way to resolve it?

td-jakubl commented 2 weeks ago

For small values tf.math.log(output) is negative
tf.clip_by_value() is not working for nan e.g. if output contains nan then tf.clip_by_value(output,epsilon,1-epsilon) also contains nan if I'm not mistaken

yijianSU22 commented 2 weeks ago

For small values tf.math.log(output) is negative

tf.clip_by_value() is not working for nan e.g. if output contains nan then tf.clip_by_value(output,epsilon,1-epsilon) also contains nan if I'm not mistaken

Thanks very much, yes, you're right. here should be -y_true *tf.math.log(output)

yijianSU22 commented 2 weeks ago

For small values tf.math.log(output) is negative

tf.clip_by_value() is not working for nan e.g. if output contains nan then tf.clip_by_value(output,epsilon,1-epsilon) also contains nan if I'm not mistaken

Thanks very much, yes, you're right. here should be -y_true *tf.math.log(output)

hi，Sorry to bother you again，I don't know why I used the tf.keras.losses.CategoricalCrossentropy() to compute CE, the loss value still will be negative during training loop.

SuryanarayanaY commented 1 week ago

Hi @yijianSU22 ,

The Optf.math.log(x) outputs -inf if the value of x is 0 and nan if x<0. You can clip -inf values to a value you want using tf.clip_by_value . But for nan , clip_by_value also returns nan. SInce this is custom loss function, maybe you need to recheck it.

keras-team / keras

The loss becomes neagative from positive values dring taining loop #19638