General questions regarding experiments with wHD

Assuming you are asking the author,

No, but some people have experimented with it. You can see some issues in this repo about this. Remember that loss functions are designed for specific tasks. The WHD was designed for key point/object detection. I have no intuition on how/why this would work on any other task.
The output activation unit is always tightly coupled with the loss function. Softmax would make the output channel sum to one, across space. The point of softmax is to do a differentiable "argmax" (https://en.wikipedia.org/wiki/Softmax_function#Smooth_arg_max), commonly appropriate with classification tasks where the outputs are logits that are indices over classes. However, our output map may have multiple correct high activations corresponding to multiple detected objects. A softmax would force the NN to select one over the other, while the sigmoid would not. The point of the sigmoid is just to restrict the output between 0 and 1. The insight is realizing what you would be taking the softmax over (classes or locations?).
Yes, I don't see why not. By softmax across channels I assume you mean taking the softmax within each heat map, i.e, one softmax per channel. Your approach may get unfeasible if the number of points is too high, though, and assumes all images have the same number of points. If the # of GT points is known, then this is a simpler problem and you probably don't need the WHD. You can also try --n-points, which removes the lateral network and the regression component of the loss function (second term in Equation 9 in the paper). But other solutions are also conceivable.

javiribera / locating-objects-without-bboxes

General questions regarding experiments with wHD #26