A question about "denominator" obtained by the function <_compute_normalizer>

kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.

https://pytorch-crf.readthedocs.io

MIT License

935 stars 151 forks source link

A question about "denominator" obtained by the function <_compute_normalizer> #30

Closed lemonhu closed 5 years ago

lemonhu commented 5 years ago

Thank you for your awesome work. I am confused about the calculation process of the denominator. Why do you need to perform additions for each step in the loop? https://github.com/kmkurn/pytorch-crf/blob/master/torchcrf/__init__.py#L245

next_score = torch.logsumexp(next_score, dim=1)

I think that this operation is done at the end of the loop and does not need to be executed inside the loop. I am confused about this. Can you give me some explanation?

kmkurn commented 5 years ago

Hi,

As the code comments say, that operation is there because we need to sum the probability over all possible current tags. If we were working with probabilities, to compute next_score[j] at iteration i (i.e. the probability of observing the sequence up to position i and ending with tag j), we would have to sum over all possible tags we might come from before transitioning to j. That is, we would have computed next_score[j] = sum(next_score[k] * transition[k, j] for k in range(num_tags). The sum is needed because in position i-1 we can be in tag 0, 1, 2, ..., or num_tags - 1. This is why summing over probabilities is needed at each iteration.

However, we actually works with (unnormalized) log probability score. Thus, a product becomes a sum, and a sum becomes a log-sum-exp. Hope this makes sense.

lemonhu commented 5 years ago

Thank you for your detailed answer.

In the loop we only need to perform the sum operation for each step of the score, but actually, we are performing the logsumexp(exp -> sum -> log) operation.

Is it correct for me to understand? But what is the consideration for doing this?

kmkurn commented 5 years ago

Yes. Instead of sum, we do logsumexp. This is because we're working in log probability space. Let me give an example.

Suppose we want to compute r, the sum two probabilities p and q. That is, r = p + q. This is straightforward to do. However, suppose now we're working in log probability space. That is, we don't have p and q, but we instead have x and y where x = log p and y = log q. Also, we now want to compute z = log r from x and y. How do we do that? Note that r = p + q = exp(x) + exp(y) and thus z = log(exp(x) + exp(y)). Notice how a sum in probability space (i.e. in terms of p, q, r) becomes a log-sum-exp in log probability space (i.e. in terms of x, y, z).

Hope this helps.

lemonhu commented 5 years ago

I gradually become clearer, thank you again for your detailed answer.

kmkurn commented 5 years ago

You're welcome. Glad that I could help :)