Closed hadim closed 7 years ago
I suggest averaging across batch axis, 0-dimension:
def dice_coef(y_true, y_pred, smooth=1):
intersection = K.sum(y_true * y_pred, axis=[1,2,3])
union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0)
Don't you think it should be?
def dice_coef_loss(y_true, y_pred):
return 1-dice_coef(y_true, y_pred)
With your code a correct prediction get -1 and a wrong one gets -0.25, I think this is the opposite of what a loss function should be.
# not matched
dice_coef_loss(
K.theano.shared(np.array([[0,0,0]])),
K.theano.shared(np.array([[1,1,1]]))
).eval() # -0.25
# match
dice_coef_loss(
K.theano.shared(np.array([[0,0,0]])),
K.theano.shared(np.array([[0,0,0]]))
).eval() # -1.0
Here's suggestion which uses vector operations and averages across the batch axis.
def dice_coef(y_true, y_pred, smooth=1):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.dot(y_true, K.transpose(y_pred))
union = K.dot(y_true,K.transpose(y_true))+K.dot(y_pred,K.transpose(y_pred))
return (2. * intersection + smooth) / (union + smooth)
def dice_coef_loss(y_true, y_pred):
return K.mean(1-dice_coef(y_true, y_pred),axis=-1)
# test
dice_coef_loss(
K.theano.shared(np.array([[0,0,0],[0,0,0]])),
K.theano.shared(np.array([[1,1,1],[1,1,1]]))
).eval()
# array([ 0.99999997, 0.99999997])
dice_coef_loss(
K.theano.shared(np.array([[0,0,0],[0,0,0]])),
K.theano.shared(np.array([[0,0,0],[0,0,0]]))
).eval() # array([ 0., 0.])
Hi @wassname, could you clarify your statement?
With your code a correct prediction get -1 and a wrong one gets -0.25, I think this is the opposite of what a loss function should be.
I'm quite new to ML but isn't a loss function supposed to output a lower value for a correct prediction and a higher value for a wrong one? isn't that exactly what @hadim version of the function is doing?
@cicobalico yeah sure
EDIT: I was wrong about that sorry
When I used OP's loss function my CNN converged on the exact opposite answer and made an inverse mask instead of a mask. That makes sense if it was working backwards towards -0.25.
(its not just hadim, it's written that way in [a](https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py#L26) [couple](https://github.com/EdwardTyantov/ultrasound-nerve-segmentation/blob/master/metric.py#L19) of repos which makes me think I'm missing something)
1-dice_coef OR -dice_coef makes no difference for convergence, I used both. 1-dice_coef just more familiar for monitoring as its value belong to [0, 1], not [-1, 0]
but for -dice_coef it converges on y_pred!=y_true, doesn't it. I gave specific examples above.
I think the ranges [0,1] and [0,-1] would be interchangeable but not [0,1] and [-1,0] as in this case.
It shouldn't. Back propagation must minimize loss as low as it can, -1 in case of (-dice_coef) loss
Ah that makes sense then, thanks for clarifying that!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.
Hello everybody, I need to use the dice coefficient for some computation on biomedical image data. My question is, shouldn't there be a K.abs() expression? Aren't intersection and union only a valid measure for absolute values?
Thanks for answering in advance!
if you are using dice coefficient as a loss, should you not specify the derivative of the dice coefficient w.r.t. to the output layer so that back propagation can work?
hi, I use dice loss in u-net, but the predicted images are all white. Could someone explain that?
hi, I use dice loss in u-net, but the predicted images are all white. Could someone explain that?
I suppose white means it is considering all the images as foreground. Can you post more about how did you made training set and is it binary level segmentation or multi label segmentation.
hi, I use dice loss in u-net, but the predicted images are all white. Could someone explain that?
I suppose white means it is considering all the images as foreground. Can you post more about how did you made training set and is it binary level segmentation or multi label segmentation.
I use the code from the first floor: `def dice_coef(y_true, y_pred, smooth=1): y_true_f = K.flatten(y_true) y_pred_f = K.flatten(y_pred) intersection = K.sum(y_true_f y_pred_f) return (2. intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def dice_coef_loss(y_true, y_pred): return -dice_coef(y_true, y_pred)
model.compile(optimizer=optimizer, loss=dice_coef_loss, metrics=[dice_coef])
` yes, it is binary level segmentation. I use U-Net network based on Keras. Because the output of the last layer is the probablity value of sigmoid function, if probability>05, the color = 1; is probability<0.5, the color = 0. I watched the output, that most of the probability are 0.49xxxxx, or 0.50xxxxxx. So that will cause the prediceted image become all white or all black. I think the ideal two(binary) output should be 0.9xxxx and 0.1xxxx, that is to say, they should be very close to 1.0 and 0.0. Do you have some points to let the output probability close to ground truth value? Thank you.
I suggest averaging across batch axis, 0-dimension:
def dice_coef(y_true, y_pred, smooth=1): intersection = K.sum(y_true * y_pred, axis=[1,2,3]) union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3]) return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0)
@alexander-rakhlin i've seen that some implementations of the dice-coefficient use smooth=1
, where does this value comes from? From what I understand, this value is used to avoid division by zero, so why not use a very small value close to zero (e.g. smooth=1e-9
)? In addition, by suggesting axis=[1,2,3]
, I guess you're assuming a 4D Tensorflow Tensor of size (Batch, Height, Width, Channels)
, right?
@tinalegre this was 3 years ago and I can't remember where this smooth=1
comes from. 1e-7 is a better idea. Yes, we are speaking of 4D tensor. That time I was using Theano's (Batch, Channels, Height, Width), but this makes no difference
@alexander-rakhlin thank you! It doesn't make any difference you mean, because for both channel_first => (batch, channels, height, width)
or channel_last=>(batch, height, width, channels)
representations, the Batch dimension is at axis=0
and thus return K.mean(iou, axis=0)
would work for both, right?
lease,what is the correct implementation of the dice coefficient
def dice_coef1(y_true, y_pred, smooth=1):
intersection = K.sum(y_true * y_pred, axis=[1,2,3])
union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
dice = K.mean((2. * intersection + smooth)/(union + smooth), axis=0)
return dice
Gives me the following result = 0.85
or
def dice_coef2(target, prediction, smooth=1):
numerator = 2.0 * K.sum(target * prediction) + smooth
denominator = K.sum(target) + K.sum(prediction) + smooth
coef = numerator / denominator
return coef
Gives me the following result : 0.94
I am using the following score function :
It works pretty well for me training a fully DCNN to segment images.
Would you be interested in a PR in order to implement this in Keras ?
Note that the original implementation comes from the Kaggle post https://www.kaggle.com/c/ultrasound-nerve-segmentation/forums/t/21358/0-57-deep-learning-keras-tutorial