use icdar2015 to train,loss:nan

YouSmart2016 commented 3 years ago

when i used icar2015 dataset to train, loss was nan,flollowing:

Epoch 1/10 200/200 [==============================] - ETA: 0s - loss: 10.4658 E:\githubcode\DifferentiableBinarization\generator_tf.py:98: RuntimeWarning: invalid value encountered in true_divide cosin = (square_distance - square_distance_1 - square_distance_2) /(2 np.sqrt(square_distance_1 square_distance_2)) 200/200 [==============================] - 6289s 32s/step - loss: 10.4462 - val_loss: 5.5865

Epoch 00001: saving model to checkpoints/2021-05-11\db_01_6.5202_5.5865.h5 Epoch 2/10 200/200 [==============================] - 2867s 14s/step - loss: nan - val_loss: nan

Epoch 00002: saving model to checkpoints/2021-05-11\db_02_nan_nan.h5 Epoch 3/10 200/200 [==============================] - 2693s 13s/step - loss: nan - val_loss: nan

Epoch 00003: saving model to checkpoints/2021-05-11\db_03_nan_nan.h5 Epoch 4/10 200/200 [==============================] - 2595s 13s/step - loss: nan - val_loss: nan

YanShulinjj commented 3 years ago

when i used icar2015 dataset to train, loss was nan,flollowing:

Epoch 1/10 200/200 [==============================] - ETA: 0s - loss: 10.4658 E:\githubcode\DifferentiableBinarization\generator_tf.py:98: RuntimeWarning: invalid value encountered in true_divide cosin = (square_distance - square_distance_1 - square_distance_2) /(2 np.sqrt(square_distance_1 square_distance_2)) 200/200 [==============================] - 6289s 32s/step - loss: 10.4462 - val_loss: 5.5865

Epoch 00001: saving model to checkpoints/2021-05-11\db_01_6.5202_5.5865.h5 Epoch 2/10 200/200 [==============================] - 2867s 14s/step - loss: nan - val_loss: nan

Epoch 00002: saving model to checkpoints/2021-05-11\db_02_nan_nan.h5 Epoch 3/10 200/200 [==============================] - 2693s 13s/step - loss: nan - val_loss: nan

Epoch 00003: saving model to checkpoints/2021-05-11\db_03_nan_nan.h5 Epoch 4/10 200/200 [==============================] - 2595s 13s/step - loss: nan - val_loss: nan

you could reduce lr when loss was nan. I guess.

alex-ht commented 2 years ago

I solve it by replace loss functions. In losses.py,

# comment out following two lines
#weights = (weights - tf.reduce_min(weights)) / (tf.reduce_max(weights) - tf.reduce_min(weights)) + 1.
#mask = mask * weights
....
# replace L1 loss with huber loss
#loss = K.switch(mask_sum > 0, tf.reduce_sum(tf.abs(pred - gt) * mask) / mask_sum, tf.constant(0.))
mask = tf.not_equal(mask, tf.constant(0.))
loss = K.switch(mask_sum > 0, tf.losses.huber_loss(gt[mask], pred[..., 0][mask]), tf.constant(0.))

ArsalanYounus007 commented 1 year ago

def dice_loss(args):
    pred, gt, mask, weights = args
    pred = pred[..., 0]
    # weights = (weights - tf.reduce_min(weights)) / (tf.reduce_max(weights) - tf.reduce_min(weights)) + 1.
    # mask = mask * weights
    intersection = tf.reduce_sum(pred * gt * mask)
    union = tf.reduce_sum(pred * mask) + tf.reduce_sum(gt * mask) + 1e-6
    loss = 1 - 2.0 * intersection / union
    return loss

def l1_loss(args, scale=10.):
    pred, gt, mask = args
    pred = pred[..., 0]
    mask_sum = tf.reduce_sum(mask)
    # loss = K.switch(mask_sum > 0, tf.reduce_sum(tf.abs(pred - gt) * mask) / mask_sum, tf.constant(0.))
    mask = tf.not_equal(mask, tf.constant(0.))
    loss = K.switch(mask_sum > 0, tf.compat.v1.losses.huber_loss(gt[mask], pred[..., 0][mask]), tf.constant(0.))
    loss = loss * scale
    return loss

My losses code looks like this now. But I am getting this error. Can you help?

  File "F:\DifferentiableBinarization\train.py", line 38, in <module>
    model, prediction_model = dbnet(input_size=input_image_size)
  File "F:\DifferentiableBinarization\model.py", line 60, in dbnet
    loss = layers.Lambda(db_loss, name='db_loss')([p, b_hat, gt_input, mask_input, t, thresh_input, thresh_mask_input])
  File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\keras\backend\tensorflow_backend.py", line 75, in symbolic_fn_wrapper
    return func(*args, **kwargs)
  File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\keras\engine\base_layer.py", line 489, in __call__
    output = self.call(inputs, **kwargs)
  File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\keras\layers\core.py", line 716, in call
    return self.function(inputs, **arguments)
  File "F:\DifferentiableBinarization\losses.py", line 57, in db_loss
    l1_loss_ = l1_loss([thresh, thresh_map, thresh_mask])
  File "F:\DifferentiableBinarization\losses.py", line 50, in l1_loss
    loss = K.switch(mask_sum > 0, tf.compat.v1.losses.huber_loss(gt[mask], pred[..., 0][mask]), tf.constant(0.))
  File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 905, in _slice_helper
    return boolean_mask(tensor=tensor, mask=slice_spec)
  File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1678, in boolean_mask
    shape_tensor[axis:axis + ndims_mask].assert_is_compatible_with(shape_mask)
  File "C:\Users\Arsalan\anaconda3\envs\python_36\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 1117, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (None, None) and (None, 1600, 1600) are incompatible

xuannianz / DifferentiableBinarization

use icdar2015 to train,loss:nan #25