Closed XiaoyanLi1 closed 5 years ago
same doubt, do you figure it out ?
When summing over k for the first term, it becomes a constant no matter whether S is discrete or continuous. So I chose to ignore the first term.
@meng-tang Hi, I get confused. Since W is generated using Gaussian kernel, so every item in W is positive. S_k is the softmax output, so S_k is also positive. Then the gradient is always negative? So how can the gradient descent work?
The loss just keep increasing......
Hi, I'm very interested at your work and want to follow your paper in ECCV2018. I notice that in the paper the DenseCRF loss is , while in the code it is In both the code and the paper, its loss is computed as . However, I think the gradient should be . Why is there a difference between the implementation and the theory? Should the first term of the gradient be ignored?