keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.92k stars 19.45k forks source link

Dice score function #3611

Closed hadim closed 7 years ago

hadim commented 8 years ago

I am using the following score function :

def dice_coef(y_true, y_pred, smooth=1):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def dice_coef_loss(y_true, y_pred):
    return -dice_coef(y_true, y_pred)

# ...
model.compile(optimizer=optimizer, loss=dice_coef_loss, metrics=[dice_coef])
# ...

It works pretty well for me training a fully DCNN to segment images.

Would you be interested in a PR in order to implement this in Keras ?

Note that the original implementation comes from the Kaggle post https://www.kaggle.com/c/ultrasound-nerve-segmentation/forums/t/21358/0-57-deep-learning-keras-tutorial

alexander-rakhlin commented 8 years ago

I suggest averaging across batch axis, 0-dimension:

def dice_coef(y_true, y_pred, smooth=1):
    intersection = K.sum(y_true * y_pred, axis=[1,2,3])
    union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
    return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0)
wassname commented 8 years ago

Don't you think it should be?

def dice_coef_loss(y_true, y_pred):
    return 1-dice_coef(y_true, y_pred)

With your code a correct prediction get -1 and a wrong one gets -0.25, I think this is the opposite of what a loss function should be.

# not matched
dice_coef_loss(
    K.theano.shared(np.array([[0,0,0]])),
    K.theano.shared(np.array([[1,1,1]]))
).eval() # -0.25
# match
dice_coef_loss(
    K.theano.shared(np.array([[0,0,0]])),
    K.theano.shared(np.array([[0,0,0]]))
).eval() # -1.0

Here's suggestion which uses vector operations and averages across the batch axis.

def dice_coef(y_true, y_pred, smooth=1):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.dot(y_true, K.transpose(y_pred))
    union = K.dot(y_true,K.transpose(y_true))+K.dot(y_pred,K.transpose(y_pred))
    return (2. * intersection + smooth) / (union + smooth)

def dice_coef_loss(y_true, y_pred):
    return K.mean(1-dice_coef(y_true, y_pred),axis=-1)

# test
dice_coef_loss(
    K.theano.shared(np.array([[0,0,0],[0,0,0]])),
    K.theano.shared(np.array([[1,1,1],[1,1,1]]))
).eval() 
# array([ 0.99999997,  0.99999997])

dice_coef_loss(
    K.theano.shared(np.array([[0,0,0],[0,0,0]])),
    K.theano.shared(np.array([[0,0,0],[0,0,0]]))
).eval() # array([ 0.,  0.])
cicobalico commented 8 years ago

Hi @wassname, could you clarify your statement?

With your code a correct prediction get -1 and a wrong one gets -0.25, I think this is the opposite of what a loss function should be.

I'm quite new to ML but isn't a loss function supposed to output a lower value for a correct prediction and a higher value for a wrong one? isn't that exactly what @hadim version of the function is doing?

wassname commented 8 years ago

@cicobalico yeah sure



EDIT: I was wrong about that sorry

When I used OP's loss function my CNN converged on the exact opposite answer and made an inverse mask instead of a mask. That makes sense if it was working backwards towards -0.25.

(its not just hadim, it's written that way in [a](https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py#L26) [couple](https://github.com/EdwardTyantov/ultrasound-nerve-segmentation/blob/master/metric.py#L19) of repos which makes me think I'm missing something) 
alexander-rakhlin commented 8 years ago

1-dice_coef OR -dice_coef makes no difference for convergence, I used both. 1-dice_coef just more familiar for monitoring as its value belong to [0, 1], not [-1, 0]

wassname commented 8 years ago

but for -dice_coef it converges on y_pred!=y_true, doesn't it. I gave specific examples above.

I think the ranges [0,1] and [0,-1] would be interchangeable but not [0,1] and [-1,0] as in this case.

alexander-rakhlin commented 8 years ago

It shouldn't. Back propagation must minimize loss as low as it can, -1 in case of (-dice_coef) loss

wassname commented 8 years ago

Ah that makes sense then, thanks for clarifying that!

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

karhunenloeve commented 6 years ago

Hello everybody, I need to use the dice coefficient for some computation on biomedical image data. My question is, shouldn't there be a K.abs() expression? Aren't intersection and union only a valid measure for absolute values?

Thanks for answering in advance!

JadBatmobile commented 5 years ago

if you are using dice coefficient as a loss, should you not specify the derivative of the dice coefficient w.r.t. to the output layer so that back propagation can work?

jizhang02 commented 5 years ago

hi, I use dice loss in u-net, but the predicted images are all white. Could someone explain that?

ankurshukla03 commented 5 years ago

hi, I use dice loss in u-net, but the predicted images are all white. Could someone explain that?

I suppose white means it is considering all the images as foreground. Can you post more about how did you made training set and is it binary level segmentation or multi label segmentation.

jizhang02 commented 5 years ago

hi, I use dice loss in u-net, but the predicted images are all white. Could someone explain that?

I suppose white means it is considering all the images as foreground. Can you post more about how did you made training set and is it binary level segmentation or multi label segmentation.

I use the code from the first floor: `def dice_coef(y_true, y_pred, smooth=1): y_true_f = K.flatten(y_true) y_pred_f = K.flatten(y_pred) intersection = K.sum(y_true_f y_pred_f) return (2. intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def dice_coef_loss(y_true, y_pred): return -dice_coef(y_true, y_pred)

...

model.compile(optimizer=optimizer, loss=dice_coef_loss, metrics=[dice_coef])

...

` yes, it is binary level segmentation. I use U-Net network based on Keras. Because the output of the last layer is the probablity value of sigmoid function, if probability>05, the color = 1; is probability<0.5, the color = 0. I watched the output, that most of the probability are 0.49xxxxx, or 0.50xxxxxx. So that will cause the prediceted image become all white or all black. I think the ideal two(binary) output should be 0.9xxxx and 0.1xxxx, that is to say, they should be very close to 1.0 and 0.0. Do you have some points to let the output probability close to ground truth value? Thank you.

tinalegre commented 5 years ago

I suggest averaging across batch axis, 0-dimension:

def dice_coef(y_true, y_pred, smooth=1):
    intersection = K.sum(y_true * y_pred, axis=[1,2,3])
    union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
    return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0)

@alexander-rakhlin i've seen that some implementations of the dice-coefficient use smooth=1, where does this value comes from? From what I understand, this value is used to avoid division by zero, so why not use a very small value close to zero (e.g. smooth=1e-9)? In addition, by suggesting axis=[1,2,3], I guess you're assuming a 4D Tensorflow Tensor of size (Batch, Height, Width, Channels), right?

alexander-rakhlin commented 5 years ago

@tinalegre this was 3 years ago and I can't remember where this smooth=1 comes from. 1e-7 is a better idea. Yes, we are speaking of 4D tensor. That time I was using Theano's (Batch, Channels, Height, Width), but this makes no difference

tinalegre commented 5 years ago

@alexander-rakhlin thank you! It doesn't make any difference you mean, because for both channel_first => (batch, channels, height, width) or channel_last=>(batch, height, width, channels) representations, the Batch dimension is at axis=0 and thus return K.mean(iou, axis=0) would work for both, right?

Tombery1 commented 2 years ago

lease,what is the correct implementation of the dice coefficient

def dice_coef1(y_true, y_pred, smooth=1):
  intersection = K.sum(y_true * y_pred, axis=[1,2,3])
  union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
  dice = K.mean((2. * intersection + smooth)/(union + smooth), axis=0)
  return dice

Gives me the following result = 0.85

or

def dice_coef2(target, prediction, smooth=1):
    numerator = 2.0 * K.sum(target * prediction) + smooth
    denominator = K.sum(target) + K.sum(prediction) + smooth
    coef = numerator / denominator

    return coef

Gives me the following result : 0.94