Closed kkhuang1990 closed 5 years ago
Hi @kkhuang1990,
You are right. The Averaged Hausdorff Loss is not diferentiable w.r.t. the output of the network (e.g, a probability map) because it is defined between two sets of points. The problem you are describing is the entire point of using the Weighted Hausdorff Distance described in the paper. Try using the WHD instead.
@kkhuang1990: have you got some gain using whd loss? I often use dice loss and cross entropy for binary classification problem. @javiribera: do you think whd is competitive with dice loss?
Hi john1231983,
I am using weighted Hausdorff distance loss for boundary detection which is different from binary image segmentation problem. For my current task (boundary detection), whd loss performs much better than dice and cross entropy loss. For binary image segmentation, I am not sure whether it can performs better or not. Actually it depends on the problem u want to solve and the pixel frequency distribution of the train image.
2018年12月18日(火) 9:48 John1231983 notifications@github.com:
@kkhuang1990 https://github.com/kkhuang1990: have you got some gain using whd loss? I often use dice loss and cross entropy for binary classification problem. @javiribera https://github.com/javiribera: do you think whd is competitive with dice loss?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/javiribera/weighted-hausdorff-loss/issues/2#issuecomment-448054447, or mute the thread https://github.com/notifications/unsubscribe-auth/AOOuYQamK496qmfrvW0P7YbtmHv_UK9_ks5u6DtIgaJpZM4XbDRU .
Great to hear it. Are you using whd in the paper?
2018年12月18日(火) 15:30 John1231983 notifications@github.com:
Great to hear it. Are you using whd in the paper?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/javiribera/weighted-hausdorff-loss/issues/2#issuecomment-448111860, or mute the thread https://github.com/notifications/unsubscribe-auth/AOOuYa40wTz-Pv3wYyN6vZ6f4GlJZjcBks5u6IuNgaJpZM4XbDRU .
@javiribera : For WHD, I understood as follows:
BxWxH
softmax prob of output network, and BxC
where C
is the number of ground-truth points. For each batch, we have WxH
probability shape and C
points, then you compute the distance of each position in WxH
to each points in C
points and each points in C
points to each position in WxH
. You will obtain a matrix size of WHxC
Am I right?
Thanks
@John1231983: 1.a. The output does not have to be a softmax prob. Just all elements be bounded between 0 and 1. 2.a. Not for each batch, but for each image. This is correct:
For each
batchimage, we have WxH probability shape and C points, then you compute the distance of each position in WxH to each points in C points and each points in C points to each position in WxH. You will obtain a matrix size of WHxC
@kkhuang1990: have you got some gain using whd loss? I often use dice loss and cross entropy for binary classification problem. @javiribera: do you think whd is competitive with dice loss?
@John1231983 , you can find the intuition at the last paragraph of section 4.2 in the paper on Arxiv: https://arxiv.org/pdf/1806.07564.pdf
@javiribera : Thanks. If I assume that each image has N
ground-truth points and network provides a WxH
softmax probability. So, the WHD will compute the distance from every position in WxH
to all positions in N
points to obtain d(x,y)
. Is it correct? So the program like
max_dist=0
for point_x in softmax_prob_imag
min_dist = inf
for point_y in N_points
d_dist = dist(point_x, point_y)
if d_dist < min_dist:
min_dist= d_dist
if max_dist< min_dist:
max_dist=min_dist
d_x_y=max_dist
As I said above, the output of the network does not have to be a softmax probability. Also, if you are trying to illustrate with pseudocode what the WHD does, this is incorrect. Note that what your pseudocode implements is this equation:
This is not what the WHD is doing. It is one of the two terms of the Hausdorff Distance. The WHD is explained in section 4.2 of the paper: https://arxiv.org/pdf/1806.07564.pdf
@javiribera : After reading your paper, I figured out your idea. Please check if my understanding is correct or not
Given an image I
, the output of I
using Unet is a score s
(does not necessary to normalize to [0,1]). The softmax score size is WxH
, and ground-truth points is Nx1
. So the equation (5) WHD will be computed as:
N
ground-truth points. The output of the step is Nx1
vector d
, taking the minimum of the vector and multiply with its score, we the term in the first term
WH
times (because we have WH points in the score), and taking sum of the outputs, we obtain the first term
Am I right for these steps?
@javiribera : After reading your paper, I figured out your idea. Please check if my understanding is correct or not
Given an image
I
, the output ofI
using Unet is a scores
(does not necessary to normalize to [0,1]). The softmax score size isWxH
, and ground-truth points isNx1
. So the equation (5) WHD will be computed as:* Compute distance from each point in score to the ground-truth points. We have `N` ground-truth points. The output of the step is `Nx1` vector `d`, taking the minimum of the vector and multiply with its score, we the term in the first term ![screenshot from 2019-01-09 09-49-24](https://user-images.githubusercontent.com/24875971/50906777-ddc27480-13f3-11e9-9c42-041ea0b76217.png) * Repeat the step (1) `WH` times (because we have WH points in the score), and taking sum of the outputs, we obtain the first term ![screenshot from 2019-01-09 09-51-34](https://user-images.githubusercontent.com/24875971/50906937-2da13b80-13f4-11e9-83d6-bca9e17c917e.png) * Same process as two steps above, just change from points in score to points in ground-truth and divide by score, we can obtain the second term.
Am I right for these steps?
Yes, this is correct with the detail that in the explanation of your last bullet, the score (p_x) is applied before taking the minimum, because the minimum is taken over the points in the score.
Hi @John1231983 , have you got some gain using whd loss for the binary segmentation?
@JunMa11 : Not improve in my case. Hope you can have some improvement
The WHD is not intended for binary segmentation. In that case the non-zero values in the label mask occupy a lot of pixels in the image, so a pixelwise loss (e.g L2) is more appropriate. So it's an expected behavior.
thank you very much for sharing the codes.
I have some questions about the Averaged Hausdorff Loss. currently i am trying to solve a boundary detection problem based on medical image dataset. I tried to use your codes for AveragedHausdorffLoss, however inputs of your function are two point sets while my inputs are 2-class softmax probability map and ground truth labels. The critical issue here is that I have to calculate set1 from the probability map using torch.max() while the torch.max() function is not differentiable and thus not able to be pack-propagated.
My question is do u know some methods to avoid the 'not being able to be back-propagated' problem or other implementations which directly use the prob map to calculate loss.
Best regards!