keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.16k stars 19.49k forks source link

Support ignore label in cross entropy functions #6118

Closed kivantium closed 6 years ago

kivantium commented 7 years ago

This is a new feature request.

In Caffe, SigmoidCrossEntropyLossLayer can specify a label to be ignored. This feature is required for the implementation of Fully Convolutional Networks for Semantic Segmentation, which says "The training ignores pixels that are masked out (as ambiguous or difficult) in the ground truth." in section 4.

How to mask binary crossentropy loss? - Google Group mentions this feature. In this thread,

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(tf.multiply(y_pred, tf.cast(tf.not_equal(y_true, -1), tf.float32)),
                                        tf.multiply(y_true, tf.cast(tf.not_equal(y_true, -1), tf.float32))), axis=-1)

was the answer, but this implementation only supports TensorFlow.

emoebel commented 7 years ago

I'm having the exact same issue, and opened a similar thread: #5911 But nobody answered yet :-/

wangg12 commented 7 years ago

Hi @kivantium, have you figured out how to implement the sigmoid cross entropy with ignore label?

I have two questions for the code you posted.

  1. How well does the code work?

  2. This implementation has a final average operation along the last dimension (axis=-1). Shouldn't it be just averaging along the unignored labels, i.e. excluding the ignore labels?

kivantium commented 7 years ago

Hi @wangg12,

I still don't have any good idea about implementation.

Answers for your questions about code: 1) sorry, I have not tested it. 2) I think so, but averaging along the unignored labels is not supported (as long as I know), so we need some workaround.

wangg12 commented 7 years ago

OK, thanks @kivantium .

TheRevanchist commented 7 years ago

I am trying to implement the same paper now. As far as I understood (from this thread: https://stackoverflow.com/questions/37312421/tensorflow-whats-the-difference-between-sparse-softmax-cross-entropy-with-logi) then you can use 'sparse_softmax_cross_entropy_with_logits' with the label you want to ignore as -1 to do what you're looking for.

JimmyCai91 commented 7 years ago

Hi here is my suggestion to deal with ignored label... to use compute_weighted_loss, here I use sigmoid_cross_entropy_with_logits for example to calculate loss of foreground/background segmentation. The unc is a tensor same shape as label, the value of unc is set to 0 in the position of ignored labels and 1 in the position of labels that should not be ignored... in this case, the final loss not calculated on the ignored labels. Actually, with this method you can ignore whatever you want...

xentropy = tf.reduce_mean(tf.losses.compute_weighted_loss( weights = tf.cast(unc, tf.float32), losses = tf.nn.sigmoid_cross_entropy_with_logits( logits = logits, labels = tf.cast(label, tf.float32))), name='xentropy')

kivantium commented 7 years ago

@TheRevanchist @JimmyCai91 Thank you! So we need to use backend functions to implement this...

liuzhisheng1226 commented 6 years ago

Hi,This is my suggestion to deal with ignored label. raw_prediction=tf.reshape(logits,[-1,FLAGS.NUM_OF_CLASSESS]) gt=tf.reshape(annotation,[-1])

supposed 2 is the ignored label

indices=tf.squeeze(tf.where(tf.not_equal(gt,2)),1)
gt=tf.cast(tf.gather(gt,indices),tf.int32)
prediction=tf.gather(raw_prediction,indices)

loss=tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=prediction,labels=gt,name="entropy")))

liuzhisheng1226 commented 6 years ago

If you find any problem,Please tell me.But I try it run ok.

wt-huang commented 6 years ago

Closing as this is resolved, free to reopen if problem persists.

munum commented 5 years ago

@liuzhisheng1226 As far as I know, sigmoid_cross_entropy_with_logits should be called with valid probability distributions on labels. Wouldn't your approach mess up the probabilities?

jarednielsen commented 4 years ago

Any status on this? Would love a cleaner solution similar to PyTorch's ignore_index parameter in CrossEntropyLoss.

TobiasFischerP4D commented 4 years ago

I can only second that @wt-huang. Being able to pass an integer indicating labels that should not enter the loss would be great! I know it is possible to define a custom loss function, however dragging that around is rather complicated.

fabricecarles commented 4 years ago

+1 for a cleaner solution similar to PyTorch's ignore_index and caffe's ignore_label. I suggest to reopen this issue since the accepted solution above is not clear. Could you please provide a full example using keras and tensorflow >= 2.x ?

In caffe using, "SoftmaxWithLoss" layer, we can add a loss_param { ignore_label: 255 } to tell caffe to ignore this label:

layer { name: "loss" type: "SoftmaxWithLoss" bottom: "prediction" bottom: "labels_with_255_as_ignore" loss_weight: 1 loss_param: { ignore_label: 255 } }

Looking on the web, there is plethora of issues when asking keras (or tensorflow) to deal with ignore label for semantic segmentatation but still no cleaner solution...

https://stackoverflow.com/questions/59972024/mask-the-loss-function-for-segmantic-segmentation-in-tf-keras https://stackoverflow.com/questions/56328140/how-do-i-implement-a-masked-softmax-cross-entropy-loss-function-in-keras https://stackoverflow.com/questions/54887933/how-to-to-drop-a-specific-labeled-pixels-in-semantic-segmentation https://stackoverflow.com/questions/46097968/tensorflow-how-to-handle-void-labeled-data-in-image-segmentation https://stackoverflow.com/questions/55529944/is-there-a-way-to-make-keras-ignore-a-label-when-computing-binary-crossentropy-l

vfp1 commented 4 years ago

+1 for a cleaner solution similar to PyTorch's ignore_index and caffe's ignore_label. I suggest to reopen this issue since the accepted solution above is not clear. Could you please provide a full example using keras and tensorflow >= 2.x ?

In caffe using, "SoftmaxWithLoss" layer, we can add a loss_param { ignore_label: 255 } to tell caffe to ignore this label:

layer { name: "loss" type: "SoftmaxWithLoss" bottom: "prediction" bottom: "labels_with_255_as_ignore" loss_weight: 1 loss_param: { ignore_label: 255 } }

Looking on the web, there is plethora of issues when asking keras (or tensorflow) to deal with ignore label for semantic segmentatation but still no cleaner solution...

https://stackoverflow.com/questions/59972024/mask-the-loss-function-for-segmantic-segmentation-in-tf-keras https://stackoverflow.com/questions/56328140/how-do-i-implement-a-masked-softmax-cross-entropy-loss-function-in-keras https://stackoverflow.com/questions/54887933/how-to-to-drop-a-specific-labeled-pixels-in-semantic-segmentation https://stackoverflow.com/questions/46097968/tensorflow-how-to-handle-void-labeled-data-in-image-segmentation https://stackoverflow.com/questions/55529944/is-there-a-way-to-make-keras-ignore-a-label-when-computing-binary-crossentropy-l

+1 to this, I really want to keep using keras, but find PyTorch to be way easier at ignoring background values in semantic segmentation tasks.

kivantium commented 4 years ago

I cannot reopen this issue, because a collaborator @wt-huang closed it. (cf. How to re-open an issue in github?)

If you still have a trouble, it might be better to open a new issue and link to this thread.

lucasdavid commented 2 years ago

Thanks for the reference @liuzhisheng1226! It works great for simple cases, but I found one problem with it: masking the signals will change the shape of the tensors (for example, for targets (batch, H, W) and logits (batch, H, W, classes), it will result in tensors of shape (valid,) and (valid, classes), where valid are the number of pixels not ignored). You reduce_mean these values at the end, which hides this problem.

It's expected from loss functions to preserve the batch axis, so that (a) sample-wise weights can be applied, and (b) loss can be correctly reduced in multi-GPU scenarios. Furthermore, it becomes really difficult to implement combining losses (e.g. combo loss) using this. With that in mind, I modified your solution to reconstruct the batch after cross-entropy is applied. I'll add it below if it helps anyone:

import warnings
import tensorflow as tf

def sparse_categorical_crossentropy(
    y_true: tf.Tensor,
    y_pred: tf.Tensor,
    from_logits: bool = False,
    ignore_index: int = -1,
    axis: int = -1
):
  y_true = tf.convert_to_tensor(y_true)
  y_pred = tf.convert_to_tensor(y_pred)
  y_pred, from_logits = to_logits(y_pred, from_logits)

  y_true = squeeze(y_true)
  y_pred = squeeze(y_pred, dims=3)

  valid_mask = y_true != ignore_index
  indices = tf.where(valid_mask)

  ce = tf.losses.sparse_categorical_crossentropy(
      y_true[valid_mask], y_pred[valid_mask], from_logits, axis
  )
  ce = tf.scatter_nd(indices, ce, tf.cast(tf.shape(y_true), tf.int64))

  ce = tf.math.divide_no_nan(
      tf.reduce_sum(ce, axis=-1),
      tf.cast(tf.math.count_nonzero(valid_mask, axis=-1), ce.dtype)
  )

  return ce

def squeeze(y, dims: int = 2):
  if dims not in (2, 3):
    raise ValueError(f'Illegal value for parameter dims=`{dims}`. Can only squeeze '
                     'positional signal, resulting in a tensor with rank 2 or 3.')
  shape = tf.shape(y)
  new_shape = [shape[0], -1]
  if dims == 3:  # keep channels.
    new_shape += [shape[-1]]
  return tf.reshape(y, new_shape)

def to_logits(output, from_logits: bool = False):
  if from_logits:
    return output, True

  if hasattr(output, '_keras_logits'):
    if from_logits:
      warnings.warn(
          '"`dig_logits_if_available` received `from_logits=True`, but '
          'the `output` argument was produced by a sigmoid or softmax '
          'activation and thus does not represent logits. Was this intended?"',
          stacklevel=2)
    return output._keras_logits, True

  if (not isinstance(output, (tf.__internal__.EagerTensor, tf.Variable)) and
      output.op.type in ('Softmax', 'Sigmoid')) and not hasattr(output, '_keras_history'):
    assert len(output.op.inputs) == 1
    return output.op.inputs[0], True

  return output, False

Brief testing

Classification
y_true = tf.constant([0, 1, 2, 2])
scores = tf.constant([
  [.8, .1, .1],
  [.2, .2, .5],
  [.2, .4, .4],
  [.25, .35, .5],
])

print(tf.losses.sparse_categorical_crossentropy(y_true, scores).numpy())
print(sparse_categorical_crossentropy(y_true, scores).numpy())
[0.22314355 1.5040774  0.91629076 0.7884574 ]
[0.22314355 1.5040774  0.91629076 0.7884574 ]
scores = tf.constant([
  [1.8, 1.2, .5],
  [ .2, 3.8, .8],
  [1.1, .4, 3.4],
  [1.3, .7, 3.8],
])

print(tf.losses.sparse_categorical_crossentropy(y_true, scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(y_true, scores, from_logits=True).numpy())
[0.5995744  0.07428224 0.13980183 0.11967831]
[0.5995744  0.07428224 0.13980183 0.11967831]
print(tf.losses.sparse_categorical_crossentropy(y_true, tf.keras.activations.softmax(scores)).numpy())
print(sparse_categorical_crossentropy(y_true, tf.keras.activations.softmax(scores)).numpy())
[0.5995744  0.07428224 0.13980183 0.11967831]
[0.5995744  0.07428224 0.13980183 0.11967831]
print(sparse_categorical_crossentropy(tf.constant([-1,  1,  2,  2]), scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(tf.constant([ 0, -1,  2,  2]), scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(tf.constant([ 0,  1, -1,  2]), scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(tf.constant([ 0,  1,  2, -1]), scores, from_logits=True).numpy())
[0.         0.07428224 0.13980183 0.11967831]
[0.5995744  0.         0.13980183 0.11967831]
[0.5995744  0.07428224 0.         0.11967831]
[0.5995744  0.07428224 0.13980183 0.        ]
Segmentation
y_true = tf.constant([
  [[ 0, 2],
   [-1, -1]],
  [[ 0, 2],
   [-1, -1]],
])
scores = tf.constant([
  [[[1., .0, .0], [.0, .0, 1.]],
   [[.2, .5, .3], [.0, 1., .0]]],
  [[[1., .0, .0], [.0, .5, .5]],
   [[.2, .5, .3], [.0, 1., .0]]],
])

print(sparse_categorical_crossentropy(y_true, scores).numpy())
[2.3841855e-07 3.4657377e-01]