kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.
https://pytorch-crf.readthedocs.io
MIT License
935 stars 151 forks source link

Get the score for the best_tags_list #51

Closed robinsongh381 closed 2 years ago

robinsongh381 commented 4 years ago

Hello !

Thank you for your awesome work

I know that this question has been asked before, but somehow I cannot manage to get the score for the best_tags_list

On this issue #48, you said that manipulating forward would do the job but I am not sure how you could do that.

I thought that making modifications on _viterbi_decode function would provide the score of the best sequence. Actually I have printed score within the _viterbi_decode function but still not find what I want

Can you please be more specific on how to get the score for the best_tags_list ?

Thanks Regards

kmkurn commented 4 years ago

Hi, thanks for your kind words!

What I meant is you could feed the best sequence to forward to get its score/log probability. You could modify _viterbi_decode easily too. On line 321 there is score[idx].max(dim=0) but the max score is discarded. That's the score of the best tag sequence (of batch idx).

robinsongh381 commented 4 years ago

Hi

Thank you for reply

If I print score it gives

tensor([[110.8995, 109.5313, 108.5182, 110.9530, 110.1169, 108.9617, 111.1315, 110.9905, 109.4520, 110.6834, 108.8434, 110.5361, 111.0204, 110.0683, 109.6099, 109.7697, 110.3875, 110.2707, 108.6854, 110.1478, 116.0886, 112.9962]], device='cuda:0', grad_fn=)

with batch_size being 1

Could you explain what each element represents ? For example, the max_value is 116.0886 whose position is 20 and hence 20th entity-tag is predicted. But what does this score exactly mean ?

On top of that, I am still unsure how should I modify _viterbi_decode to get the probability for the optimal prediction

Thanks

robinsongh381 commented 4 years ago

Hello

Somehow I have managed to compute something like the confidence I am looking for !

I modified decode function as follows

 def decode(self, emissions: torch.Tensor,
               mask: Optional[torch.ByteTensor] = None) -> List[List[int]]:
        """Find the most likely tag sequence using Viterbi algorithm.
        Args:
            emissions (`~torch.Tensor`): Emission score tensor of size
                ``(seq_length, batch_size, num_tags)`` if ``batch_first`` is ``False``,
                ``(batch_size, seq_length, num_tags)`` otherwise.

            tags (`~torch.LongTensor`): Sequence of tags tensor of size
                ``(seq_length, batch_size)`` if ``batch_first`` is ``False``,
                ``(batch_size, seq_length)`` otherwise.

            mask (`~torch.ByteTensor`): Mask tensor of size ``(seq_length, batch_size)``
                if ``batch_first`` is ``False``, ``(batch_size, seq_length)`` otherwise.
        Returns:
            List of list containing the best tag sequence for each batch.
        """

        self._validate(emissions, mask=mask)
        if mask is None:
            mask = emissions.new_ones(emissions.shape[:2], dtype=torch.uint8)

        # Find the optimal sequence
        best_tag_sequence = self._viterbi_decode(emissions, mask)
        best_tag_sequence = torch.tensor(best_tag_sequence)
        best_tag_sequence = best_tag_sequence.transpose(0, 1)

        # Find the probability of the optimal sequence
        denominator = self._compute_normalizer(emissions, mask)
        numerator = self._compute_score(emissions, best_tag_sequence, mask)
        llh = numerator - denominator
        log_likelihood = llh.sum() / mask.float().sum()

        confidence = torch.exp(log_likelihood)

        return self._viterbi_decode(emissions, mask), confidence
kmkurn commented 4 years ago

I just realised that the score in line 321 is unnormalised, so you need to subtract the normaliser from it to get the log probability. Your modification of decode looks good. That's what I mean by "feeding the best sequence to forward". An alternative to modifying decode like you did, you could do it like:

best_tag_sequence = crf.decode(emissions, mask)
confidence = crf(emissions, torch.tensor(best_tag_sequence), mask)

(I might screw up the types but you get the idea).