neuralmind-ai / portuguese-bert

Portuguese pre-trained BERT models
Other
792 stars 122 forks source link

For cases like O->I, should I manually set the corresponding entries in the transition probability matrix to zero? #38

Open lkqnaruto opened 2 years ago

lkqnaruto commented 2 years ago

Hi

Again, thank you for the amazing work! I wonder in NER task, for cases like O->I, Should I manually set the corresponding entries in the transition probability matrix to zero? I went through the pytorch-crf code, and didn't see such settings.

Thanks in advance!

fabiocapsouza commented 2 years ago

Hi @lkqnaruto , Yes, you can do that if you want to initialize the CRF layer with such constraints, but the pytorch-crf library does not expose an API for that, so you'll have to modify crf.transitions yourself and set them to negative values such as -1e5. Please let me know of your results if you try it, I thought about doing it before but I didn't have time :)

I wouldn't recommend putting these constraints on start_transitions though, because, for long documents that will be broken into smaller spans, there can be spans that start with I- tags.

lkqnaruto commented 2 years ago

Hi @lkqnaruto , Yes, you can do that if you want to initialize the CRF layer with such constraints, but the pytorch-crf library does not expose an API for that, so you'll have to modify crf.transitions yourself and set them to negative values such as -1e5. Please let me know of your results if you try it, I thought about doing it before but I didn't have time :)

I wouldn't recommend putting these constraints on start_transitions though, because, for long documents that will be broken into smaller spans, there can be spans that start with I- tags.

Thank you for the quick reply. Do you think putting such constraint on CRF layer could improve the model performance than without it? It looks like without setting the constraint, the currently model is actually "LEARNING" such constraint by itself.

fabiocapsouza commented 2 years ago

Yes, I believe so. I see it as a form of model initialization similar to adjusting the bias terms of a classification layer to produce the prior probabilities of the classes on the dataset (see init well), which is an good practice.

It could make the training easier, faster to converge, etc. But it does not necessarily improve the model performance.

lkqnaruto commented 2 years ago

Yes, I believe so. I see it as a form of model initialization similar to adjusting the bias terms of a classification layer to produce the prior probabilities of the classes on the dataset (see init well), which is an good practice.

It could make the training easier, faster to converge, etc. But it does not necessarily improve the model performance.

I was trying to pub the above constraint to the transition matrix, but It confused me a little bit about the index of transition matrix. Basically I was using BIO scheme: B: 0, I: 1, O: 2

So the code I was trying to modify is:

    def reset_parameters(self) -> None:
        """Initialize the transition parameters.

        The parameters will be initialized randomly from a uniform distribution
        between -0.1 and 0.1.
        """
        nn.init.uniform_(self.start_transitions, -0.1, 0.1)
        nn.init.uniform_(self.end_transitions, -0.1, 0.1)
        nn.init.uniform_(self.transitions, -0.1, 0.1)
        self.transitions[2, 1] = -1e5                           # this is the corresponding entry for O->I

or it should be:

 self.transitions[1, 2] = -1e5

which one is the correct way to implement?

Thanks in advance.