Is there a need to convert unaries to negative?

sadeepj / crfasrnn_keras

CRF-RNN Keras/Tensorflow version

http://crfasrnn.torr.vision

MIT License

603 stars 169 forks source link

Is there a need to convert unaries to negative? #4

Closed glhfgg1024 closed 7 years ago

glhfgg1024 commented 7 years ago

Hi, Mr. Jayasumana, thanks a lot for sharing your valuable code!

I have a question, in https://github.com/sadeepj/crfasrnn_keras/blob/master/crfrnn_layer.py#L76, do we need to first convert the probabilities to negative? Because in the paper https://arxiv.org/pdf/1502.03240.pdf, page 4, left-bottom, it says "we use $U_i(l)$ to denote the negative of the unary energy".

sadeepj commented 7 years ago

@glhfgg1024 $U_i(l)$ in the paper does denote negative unary energy. Note that "energy" in the paper means negative log probability (low energy => high probability). In the code, "inputs" in the line you linked is the output of the FCN network. Therefore, it's already a probability (negative energy). That's why we don't need to negate it again.

EDIT: In the last line I should have mentioned 'log-probability' instead of 'probability'

glhfgg1024 commented 7 years ago

Hi @sadeepj, thanks a lot for your kind reply and explanations!

But maybe I misunderstood something. For example, in your code, https://github.com/sadeepj/crfasrnn_keras/blob/master/crfrnn_model.py#L103, the unary given as inputs to the CrfRnnLayer is still the logits, right? The upscore has not been converted to probability before they are fed into the CrfRnnLayer. As I understood (maybe wrong), the upscore should be first converted to probabilities using tf.nn.softmax function, then conduct log operation to get the \phi_u(x_i) which is the unary costs, and then get its negative as U_l(x_i) = \phi_u(x_i=l). If you are convenient, could you please help clarify what I misunderstood?

sadeepj commented 7 years ago

@glhfgg1024

Sorry for taking so long to answer! I've been a bit busy.

As mentioned in the last line, column 1, page 4 of https://arxiv.org/pdf/1502.03240.pdf, U_i(l) = - \psi_u(x_i=l) (note the negative sign). Basically, U_i(l) values are in the 'log-probability' or 'negative energy' domain (higher value => higher probability). Now, as you mentioned, ideally we should have done a softmax() on upscore and then taken log() of that. But since softmax() does an exp() operation, softmax() followed by log() is kind of redundant (it becomes a computational burden). In other words upscore is already in the 'log-probability' domain. I understand that log() followed by softmax() is not the same as doing nothing - but it doesn't really matter as we learn the optimal CRF parameters anyway (these parameters decide how to weigh the different inputs during inference).

glhfgg1024 commented 7 years ago

Hi @sadeepj, thanks very much for your kind answers!