questions about the Hausdorff loss

kkhuang1990 commented 6 years ago

thank you very much for sharing the codes.

I have some questions about the Averaged Hausdorff Loss. currently i am trying to solve a boundary detection problem based on medical image dataset. I tried to use your codes for AveragedHausdorffLoss, however inputs of your function are two point sets while my inputs are 2-class softmax probability map and ground truth labels. The critical issue here is that I have to calculate set1 from the probability map using torch.max() while the torch.max() function is not differentiable and thus not able to be pack-propagated.

My question is do u know some methods to avoid the 'not being able to be back-propagated' problem or other implementations which directly use the prob map to calculate loss.

Best regards!

javiribera commented 6 years ago

Hi @kkhuang1990,

You are right. The Averaged Hausdorff Loss is not diferentiable w.r.t. the output of the network (e.g, a probability map) because it is defined between two sets of points. The problem you are describing is the entire point of using the Weighted Hausdorff Distance described in the paper. Try using the WHD instead.

John1231983 commented 5 years ago

@kkhuang1990: have you got some gain using whd loss? I often use dice loss and cross entropy for binary classification problem. @javiribera: do you think whd is competitive with dice loss?

kkhuang1990 commented 5 years ago

Hi john1231983,

I am using weighted Hausdorff distance loss for boundary detection which is different from binary image segmentation problem. For my current task (boundary detection), whd loss performs much better than dice and cross entropy loss. For binary image segmentation, I am not sure whether it can performs better or not. Actually it depends on the problem u want to solve and the pixel frequency distribution of the train image.

黄凱凱 Kaikai HUANG 東京大学大学院情報理工学系研究科知能機械情報学専攻原田牛久研究室 E2-81C1 The University of Tokyo Graduate School of Information Science and Technology Dept. Mechano-Informatics Harada-Ushiku Lab. E2-81C1 Phone: 080-9640-2628 E-mail: huang@mi.t.u-tokyo.ac.jp

2018年12月18日(火) 9:48 John1231983 notifications@github.com:

@kkhuang1990 https://github.com/kkhuang1990: have you got some gain using whd loss? I often use dice loss and cross entropy for binary classification problem. @javiribera https://github.com/javiribera: do you think whd is competitive with dice loss?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/javiribera/weighted-hausdorff-loss/issues/2#issuecomment-448054447, or mute the thread https://github.com/notifications/unsubscribe-auth/AOOuYQamK496qmfrvW0P7YbtmHv_UK9_ks5u6DtIgaJpZM4XbDRU .

John1231983 commented 5 years ago

Great to hear it. Are you using whd in the paper?

kkhuang1990 commented 5 years ago

yes, but i made some modifications of it. most parts are the same

黄凱凱 Kaikai HUANG 東京大学大学院情報理工学系研究科知能機械情報学専攻原田牛久研究室 E2-81C1 The University of Tokyo Graduate School of Information Science and Technology Dept. Mechano-Informatics Harada-Ushiku Lab. E2-81C1 Phone: 080-9640-2628 E-mail: huang@mi.t.u-tokyo.ac.jp

2018年12月18日(火) 15:30 John1231983 notifications@github.com:

Great to hear it. Are you using whd in the paper?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/javiribera/weighted-hausdorff-loss/issues/2#issuecomment-448111860, or mute the thread https://github.com/notifications/unsubscribe-auth/AOOuYa40wTz-Pv3wYyN6vZ6f4GlJZjcBks5u6IuNgaJpZM4XbDRU .

John1231983 commented 5 years ago

@javiribera : For WHD, I understood as follows:

Input: BxWxH softmax prob of output network, and BxC where C is the number of ground-truth points. For each batch, we have WxH probability shape and C points, then you compute the distance of each position in WxH to each points in C points and each points in C points to each position in WxH. You will obtain a matrix size of WHxC

https://github.com/javiribera/weighted-hausdorff-loss/blob/60debd891f1fb9a5fbab5fe0e14d428bbbb80993/object-locator/losses.py#L180

Am I right?

For boundary detection, we have set of points in boundary as ground-truth and a softmax output size of Bx2xWxH. How could I compute the distance between them?

Thanks

javiribera commented 5 years ago

@John1231983: 1.a. The output does not have to be a softmax prob. Just all elements be bounded between 0 and 1. 2.a. Not for each batch, but for each image. This is correct:

For each ~~batch~~ image, we have WxH probability shape and C points, then you compute the distance of each position in WxH to each points in C points and each points in C points to each position in WxH. You will obtain a matrix size of WHxC

javiribera commented 5 years ago

@kkhuang1990: have you got some gain using whd loss? I often use dice loss and cross entropy for binary classification problem. @javiribera: do you think whd is competitive with dice loss?

@John1231983 , you can find the intuition at the last paragraph of section 4.2 in the paper on Arxiv: https://arxiv.org/pdf/1806.07564.pdf

John1231983 commented 5 years ago

@javiribera : Thanks. If I assume that each image has N ground-truth points and network provides a WxH softmax probability. So, the WHD will compute the distance from every position in WxH to all positions in N points to obtain d(x,y). Is it correct? So the program like

max_dist=0
for point_x in softmax_prob_imag
    min_dist = inf
    for point_y in N_points
          d_dist = dist(point_x, point_y)
          if d_dist < min_dist:
             min_dist= d_dist
    if max_dist< min_dist:
       max_dist=min_dist
 d_x_y=max_dist

javiribera commented 5 years ago

As I said above, the output of the network does not have to be a softmax probability. Also, if you are trying to illustrate with pseudocode what the WHD does, this is incorrect. Note that what your pseudocode implements is this equation:

$\max_{x\in \Omega} \min_{y\in Y} d(x, y)$

This is not what the WHD is doing. It is one of the two terms of the Hausdorff Distance. The WHD is explained in section 4.2 of the paper: https://arxiv.org/pdf/1806.07564.pdf

John1231983 commented 5 years ago

@javiribera : After reading your paper, I figured out your idea. Please check if my understanding is correct or not

Given an image I, the output of I using Unet is a score s (does not necessary to normalize to [0,1]). The softmax score size is WxH, and ground-truth points is Nx1. So the equation (5) WHD will be computed as:

Compute distance from each point in score to the ground-truth points. We have N ground-truth points. The output of the step is Nx1 vector d, taking the minimum of the vector and multiply with its score, we the term in the first term
Repeat the step (1) WH times (because we have WH points in the score), and taking sum of the outputs, we obtain the first term
Same process as two steps above, just change from points in score to points in ground-truth and divide by score, we can obtain the second term.

Am I right for these steps?

javiribera commented 5 years ago

@javiribera : After reading your paper, I figured out your idea. Please check if my understanding is correct or not

Given an image I, the output of I using Unet is a score s (does not necessary to normalize to [0,1]). The softmax score size is WxH, and ground-truth points is Nx1. So the equation (5) WHD will be computed as:

* Compute distance from each point in score to the ground-truth points. We have `N` ground-truth points. The output of the step is `Nx1` vector `d`, taking the minimum of the vector and multiply with its score, we the term in the first term
  ![screenshot from 2019-01-09 09-49-24](https://user-images.githubusercontent.com/24875971/50906777-ddc27480-13f3-11e9-9c42-041ea0b76217.png)

* Repeat the step (1) `WH` times (because we have WH points in the score), and taking sum of the outputs, we obtain the first term
  ![screenshot from 2019-01-09 09-51-34](https://user-images.githubusercontent.com/24875971/50906937-2da13b80-13f4-11e9-83d6-bca9e17c917e.png)

* Same process as two steps above, just change from points in score to points in ground-truth and divide by score, we can obtain the second term.

Am I right for these steps?

Yes, this is correct with the detail that in the explanation of your last bullet, the score (p_x) is applied before taking the minimum, because the minimum is taken over the points in the score.

JunMa11 commented 5 years ago

Hi @John1231983 , have you got some gain using whd loss for the binary segmentation?

John1231983 commented 5 years ago

@JunMa11 : Not improve in my case. Hope you can have some improvement

javiribera commented 5 years ago

The WHD is not intended for binary segmentation. In that case the non-zero values in the label mask occupy a lot of pixels in the image, so a pixelwise loss (e.g L2) is more appropriate. So it's an expected behavior.

javiribera / locating-objects-without-bboxes

questions about the Hausdorff loss #2

yes, but i made some modifications of it. most parts are the same