javiribera / locating-objects-without-bboxes

PyTorch code for "Locating objects without bounding boxes" - Loss function and trained models
Other
249 stars 52 forks source link

General questions regarding experiments with wHD #26

Open RSKothari opened 4 years ago

RSKothari commented 4 years ago

Could you comment on the following? I believe it would help me and potentially future readers:

  1. Have you used this loss in a multitask sense? For example, combining segmentation and heat-maps from your loss?
  2. What happens if you try spatial softmax for the heat-map instead of sigmoid?
  3. In a multi-class problem where number of GT points are known and fixed, could you potentially make N heat-maps for N points, softmax across channels and expect your solution to work?
javiribera commented 4 years ago

Assuming you are asking the author,

  1. No, but some people have experimented with it. You can see some issues in this repo about this. Remember that loss functions are designed for specific tasks. The WHD was designed for key point/object detection. I have no intuition on how/why this would work on any other task.

  2. The output activation unit is always tightly coupled with the loss function. Softmax would make the output channel sum to one, across space. The point of softmax is to do a differentiable "argmax" (https://en.wikipedia.org/wiki/Softmax_function#Smooth_arg_max), commonly appropriate with classification tasks where the outputs are logits that are indices over classes. However, our output map may have multiple correct high activations corresponding to multiple detected objects. A softmax would force the NN to select one over the other, while the sigmoid would not. The point of the sigmoid is just to restrict the output between 0 and 1. The insight is realizing what you would be taking the softmax over (classes or locations?).

  3. Yes, I don't see why not. By softmax across channels I assume you mean taking the softmax within each heat map, i.e, one softmax per channel. Your approach may get unfeasible if the number of points is too high, though, and assumes all images have the same number of points. If the # of GT points is known, then this is a simpler problem and you probably don't need the WHD. You can also try --n-points, which removes the lateral network and the regression component of the loss function (second term in Equation 9 in the paper). But other solutions are also conceivable.