scrapinghub / python-crfsuite

A python binding for crfsuite
MIT License
770 stars 222 forks source link

'bias' feature in example #73

Open tisdall opened 6 years ago

tisdall commented 6 years ago

I'm very new to CRF so I apologize if my issue is just ignorance...

I was going through the example and noticed that word2features() added a 'bias' to the beginning of each feature set. Does this have a purpose? It seems that since every set of features will contain that 'bias' string that the end result should be the same without it. (or I'm just totally not getting it) I tried looking through the docs here and the crfsuite docs and couldn't find anything that would indicate the purpose.

Pantamis commented 6 years ago

'bias' is a feature that capture the proportion of a given label in the training set.

Intuitively, if you have no other feature than 'bias' in your model (so your features are just indicator functions of the current label) then the weight of the feature learned will be higher if the label appears more. When you will predict, you will just always return the label with higher weight, which is the one which appear the most during training.

In a 'real' CRF, it is just a way to express that some label are rare by themself and other not, so you may take count of this (for example you can imagine a language in which verbs are mostly avoided and not nouns, so you should express that with the weight of the 'bias' feature lower for verbs labels than nouns).

I hope it is clear...

tisdall commented 6 years ago

@Pantamis , thanks for replying, but I still don't understand. Did you look at the example? The function is just adding 'bias' to the beginning of every feature set regardless.

Pantamis commented 6 years ago

Ok I will try to be more explicit.

CRFsuite uses transition features of the form I(y_t-1=a, y_t=b) and state features of the form f(x) I(y_t=a). For transitions, the features are automatically created.

With word2features() you specify f(x) in state features. For a feature like word.istitle(), the logs potentials look like I(x.istitle()=True/False)*I(y_t=a) where a take each possible value for label.

The feature biais does not depend of x so by adding biais you are adding in your set of feature all the features of the form cte*I(y_t=a) where a has any value of a label.

So the model is learning those weights associated with labels as if labels where draw independently from a given probability distribution.

tisdall commented 6 years ago

Maybe it'd help if you answered the original question... If you simply remove the 'bias' from the feature set is the end result still the same? Likewise, if you change it to 'bill' will it make any difference?

Pantamis commented 6 years ago

"Likewise, if you change it to 'bill' will it make any difference?" : No, it is just a name for the feature, you can search for a state feature of the form (a float) B-ORG biais when you print them in the example (with any label name instead of B-ORG and 'bill' if you change it)

"If you simply remove the 'bias' from the feature set is the end result still the same ?" : Honestly i find this question really hard to answer, maybe someone else can help. If you keep it then the model learns some weights which means what I said before (the higher the weights associated with B-ORG biais, the higher the proportion of B-ORG in the training set). But if you remove it then the other features will have different weights to compensate (the high weight of B-ORG biais will increasing all other weights about B-ORG instead).

I am not sure, what happen if you try to run the example by removing 'biais' in word2features() ?

hanifabd commented 3 years ago

I applied it for sentence boundary disambiguation, when i remove the bias, my model be more aggresive in segmenting the sentence but when i add bias = 1 it make more better, but i didnt know the reason why