Closed kivantium closed 6 years ago
I'm having the exact same issue, and opened a similar thread: #5911 But nobody answered yet :-/
Hi @kivantium, have you figured out how to implement the sigmoid cross entropy with ignore label?
I have two questions for the code you posted.
How well does the code work?
This implementation has a final average operation along the last dimension (axis=-1
).
Shouldn't it be just averaging along the unignored labels, i.e. excluding the ignore labels?
Hi @wangg12,
I still don't have any good idea about implementation.
Answers for your questions about code: 1) sorry, I have not tested it. 2) I think so, but averaging along the unignored labels is not supported (as long as I know), so we need some workaround.
OK, thanks @kivantium .
I am trying to implement the same paper now. As far as I understood (from this thread: https://stackoverflow.com/questions/37312421/tensorflow-whats-the-difference-between-sparse-softmax-cross-entropy-with-logi) then you can use 'sparse_softmax_cross_entropy_with_logits' with the label you want to ignore as -1 to do what you're looking for.
Hi here is my suggestion to deal with ignored label... to use compute_weighted_loss, here I use sigmoid_cross_entropy_with_logits for example to calculate loss of foreground/background segmentation. The unc is a tensor same shape as label, the value of unc is set to 0 in the position of ignored labels and 1 in the position of labels that should not be ignored... in this case, the final loss not calculated on the ignored labels. Actually, with this method you can ignore whatever you want...
xentropy = tf.reduce_mean(tf.losses.compute_weighted_loss( weights = tf.cast(unc, tf.float32), losses = tf.nn.sigmoid_cross_entropy_with_logits( logits = logits, labels = tf.cast(label, tf.float32))), name='xentropy')
@TheRevanchist @JimmyCai91 Thank you! So we need to use backend functions to implement this...
Hi,This is my suggestion to deal with ignored label. raw_prediction=tf.reshape(logits,[-1,FLAGS.NUM_OF_CLASSESS]) gt=tf.reshape(annotation,[-1])
indices=tf.squeeze(tf.where(tf.not_equal(gt,2)),1)
gt=tf.cast(tf.gather(gt,indices),tf.int32)
prediction=tf.gather(raw_prediction,indices)
loss=tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=prediction,labels=gt,name="entropy")))
If you find any problem,Please tell me.But I try it run ok.
Closing as this is resolved, free to reopen if problem persists.
@liuzhisheng1226 As far as I know, sigmoid_cross_entropy_with_logits
should be called with valid probability distributions on labels. Wouldn't your approach mess up the probabilities?
Any status on this? Would love a cleaner solution similar to PyTorch's ignore_index
parameter in CrossEntropyLoss.
I can only second that @wt-huang. Being able to pass an integer indicating labels that should not enter the loss would be great! I know it is possible to define a custom loss function, however dragging that around is rather complicated.
+1 for a cleaner solution similar to PyTorch's ignore_index and caffe's ignore_label. I suggest to reopen this issue since the accepted solution above is not clear. Could you please provide a full example using keras and tensorflow >= 2.x ?
In caffe using, "SoftmaxWithLoss" layer, we can add a loss_param { ignore_label: 255 } to tell caffe to ignore this label:
layer { name: "loss" type: "SoftmaxWithLoss" bottom: "prediction" bottom: "labels_with_255_as_ignore" loss_weight: 1 loss_param: { ignore_label: 255 } }
Looking on the web, there is plethora of issues when asking keras (or tensorflow) to deal with ignore label for semantic segmentatation but still no cleaner solution...
https://stackoverflow.com/questions/59972024/mask-the-loss-function-for-segmantic-segmentation-in-tf-keras https://stackoverflow.com/questions/56328140/how-do-i-implement-a-masked-softmax-cross-entropy-loss-function-in-keras https://stackoverflow.com/questions/54887933/how-to-to-drop-a-specific-labeled-pixels-in-semantic-segmentation https://stackoverflow.com/questions/46097968/tensorflow-how-to-handle-void-labeled-data-in-image-segmentation https://stackoverflow.com/questions/55529944/is-there-a-way-to-make-keras-ignore-a-label-when-computing-binary-crossentropy-l
+1 for a cleaner solution similar to PyTorch's ignore_index and caffe's ignore_label. I suggest to reopen this issue since the accepted solution above is not clear. Could you please provide a full example using keras and tensorflow >= 2.x ?
In caffe using, "SoftmaxWithLoss" layer, we can add a loss_param { ignore_label: 255 } to tell caffe to ignore this label:
layer { name: "loss" type: "SoftmaxWithLoss" bottom: "prediction" bottom: "labels_with_255_as_ignore" loss_weight: 1 loss_param: { ignore_label: 255 } }
Looking on the web, there is plethora of issues when asking keras (or tensorflow) to deal with ignore label for semantic segmentatation but still no cleaner solution...
https://stackoverflow.com/questions/59972024/mask-the-loss-function-for-segmantic-segmentation-in-tf-keras https://stackoverflow.com/questions/56328140/how-do-i-implement-a-masked-softmax-cross-entropy-loss-function-in-keras https://stackoverflow.com/questions/54887933/how-to-to-drop-a-specific-labeled-pixels-in-semantic-segmentation https://stackoverflow.com/questions/46097968/tensorflow-how-to-handle-void-labeled-data-in-image-segmentation https://stackoverflow.com/questions/55529944/is-there-a-way-to-make-keras-ignore-a-label-when-computing-binary-crossentropy-l
+1 to this, I really want to keep using keras, but find PyTorch to be way easier at ignoring background values in semantic segmentation tasks.
I cannot reopen this issue, because a collaborator @wt-huang closed it. (cf. How to re-open an issue in github?)
If you still have a trouble, it might be better to open a new issue and link to this thread.
Thanks for the reference @liuzhisheng1226! It works great for simple cases, but I found one problem with it: masking the signals will change the shape of the tensors (for example, for targets (batch, H, W)
and logits (batch, H, W, classes)
, it will result in tensors of shape (valid,)
and (valid, classes)
, where valid
are the number of pixels not ignored).
You reduce_mean
these values at the end, which hides this problem.
It's expected from loss functions to preserve the batch axis, so that (a) sample-wise weights can be applied, and (b) loss can be correctly reduced in multi-GPU scenarios. Furthermore, it becomes really difficult to implement combining losses (e.g. combo loss) using this. With that in mind, I modified your solution to reconstruct the batch after cross-entropy is applied. I'll add it below if it helps anyone:
import warnings
import tensorflow as tf
def sparse_categorical_crossentropy(
y_true: tf.Tensor,
y_pred: tf.Tensor,
from_logits: bool = False,
ignore_index: int = -1,
axis: int = -1
):
y_true = tf.convert_to_tensor(y_true)
y_pred = tf.convert_to_tensor(y_pred)
y_pred, from_logits = to_logits(y_pred, from_logits)
y_true = squeeze(y_true)
y_pred = squeeze(y_pred, dims=3)
valid_mask = y_true != ignore_index
indices = tf.where(valid_mask)
ce = tf.losses.sparse_categorical_crossentropy(
y_true[valid_mask], y_pred[valid_mask], from_logits, axis
)
ce = tf.scatter_nd(indices, ce, tf.cast(tf.shape(y_true), tf.int64))
ce = tf.math.divide_no_nan(
tf.reduce_sum(ce, axis=-1),
tf.cast(tf.math.count_nonzero(valid_mask, axis=-1), ce.dtype)
)
return ce
def squeeze(y, dims: int = 2):
if dims not in (2, 3):
raise ValueError(f'Illegal value for parameter dims=`{dims}`. Can only squeeze '
'positional signal, resulting in a tensor with rank 2 or 3.')
shape = tf.shape(y)
new_shape = [shape[0], -1]
if dims == 3: # keep channels.
new_shape += [shape[-1]]
return tf.reshape(y, new_shape)
def to_logits(output, from_logits: bool = False):
if from_logits:
return output, True
if hasattr(output, '_keras_logits'):
if from_logits:
warnings.warn(
'"`dig_logits_if_available` received `from_logits=True`, but '
'the `output` argument was produced by a sigmoid or softmax '
'activation and thus does not represent logits. Was this intended?"',
stacklevel=2)
return output._keras_logits, True
if (not isinstance(output, (tf.__internal__.EagerTensor, tf.Variable)) and
output.op.type in ('Softmax', 'Sigmoid')) and not hasattr(output, '_keras_history'):
assert len(output.op.inputs) == 1
return output.op.inputs[0], True
return output, False
y_true = tf.constant([0, 1, 2, 2])
scores = tf.constant([
[.8, .1, .1],
[.2, .2, .5],
[.2, .4, .4],
[.25, .35, .5],
])
print(tf.losses.sparse_categorical_crossentropy(y_true, scores).numpy())
print(sparse_categorical_crossentropy(y_true, scores).numpy())
[0.22314355 1.5040774 0.91629076 0.7884574 ]
[0.22314355 1.5040774 0.91629076 0.7884574 ]
scores = tf.constant([
[1.8, 1.2, .5],
[ .2, 3.8, .8],
[1.1, .4, 3.4],
[1.3, .7, 3.8],
])
print(tf.losses.sparse_categorical_crossentropy(y_true, scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(y_true, scores, from_logits=True).numpy())
[0.5995744 0.07428224 0.13980183 0.11967831]
[0.5995744 0.07428224 0.13980183 0.11967831]
print(tf.losses.sparse_categorical_crossentropy(y_true, tf.keras.activations.softmax(scores)).numpy())
print(sparse_categorical_crossentropy(y_true, tf.keras.activations.softmax(scores)).numpy())
[0.5995744 0.07428224 0.13980183 0.11967831]
[0.5995744 0.07428224 0.13980183 0.11967831]
print(sparse_categorical_crossentropy(tf.constant([-1, 1, 2, 2]), scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(tf.constant([ 0, -1, 2, 2]), scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(tf.constant([ 0, 1, -1, 2]), scores, from_logits=True).numpy())
print(sparse_categorical_crossentropy(tf.constant([ 0, 1, 2, -1]), scores, from_logits=True).numpy())
[0. 0.07428224 0.13980183 0.11967831]
[0.5995744 0. 0.13980183 0.11967831]
[0.5995744 0.07428224 0. 0.11967831]
[0.5995744 0.07428224 0.13980183 0. ]
y_true = tf.constant([
[[ 0, 2],
[-1, -1]],
[[ 0, 2],
[-1, -1]],
])
scores = tf.constant([
[[[1., .0, .0], [.0, .0, 1.]],
[[.2, .5, .3], [.0, 1., .0]]],
[[[1., .0, .0], [.0, .5, .5]],
[[.2, .5, .3], [.0, 1., .0]]],
])
print(sparse_categorical_crossentropy(y_true, scores).numpy())
[2.3841855e-07 3.4657377e-01]
This is a new feature request.
In Caffe, SigmoidCrossEntropyLossLayer can specify a label to be ignored. This feature is required for the implementation of Fully Convolutional Networks for Semantic Segmentation, which says "The training ignores pixels that are masked out (as ambiguous or difficult) in the ground truth." in section 4.
How to mask binary crossentropy loss? - Google Group mentions this feature. In this thread,
was the answer, but this implementation only supports TensorFlow.