Closed lemonhu closed 5 years ago
Hi,
As the code comments say, that operation is there because we need to sum the probability over all possible current tags. If we were working with probabilities, to compute next_score[j]
at iteration i
(i.e. the probability of observing the sequence up to position i
and ending with tag j
), we would have to sum over all possible tags we might come from before transitioning to j
. That is, we would have computed next_score[j] = sum(next_score[k] * transition[k, j] for k in range(num_tags)
. The sum is needed because in position i-1
we can be in tag 0, 1, 2, ..., or num_tags - 1
. This is why summing over probabilities is needed at each iteration.
However, we actually works with (unnormalized) log probability score. Thus, a product becomes a sum, and a sum becomes a log-sum-exp. Hope this makes sense.
Thank you for your detailed answer.
In the loop we only need to perform the sum
operation for each step of the score, but actually, we are performing the logsumexp
(exp -> sum -> log) operation.
Is it correct for me to understand? But what is the consideration for doing this?
Yes. Instead of sum
, we do logsumexp
. This is because we're working in log probability space. Let me give an example.
Suppose we want to compute r
, the sum two probabilities p
and q
. That is, r = p + q
. This is straightforward to do. However, suppose now we're working in log probability space. That is, we don't have p
and q
, but we instead have x
and y
where x = log p
and y = log q
. Also, we now want to compute z = log r
from x
and y
. How do we do that? Note that r = p + q = exp(x) + exp(y)
and thus z = log(exp(x) + exp(y))
. Notice how a sum in probability space (i.e. in terms of p, q, r
) becomes a log-sum-exp in log probability space (i.e. in terms of x, y, z
).
Hope this helps.
I gradually become clearer, thank you again for your detailed answer.
You're welcome. Glad that I could help :)
Thank you for your awesome work. I am confused about the calculation process of the denominator. Why do you need to perform additions for each step in the loop? https://github.com/kmkurn/pytorch-crf/blob/master/torchcrf/__init__.py#L245
I think that this operation is done at the end of the loop and does not need to be executed inside the loop. I am confused about this. Can you give me some explanation?