scrapinghub / python-crfsuite

A python binding for crfsuite
MIT License
771 stars 221 forks source link

[Question] How is string feature handled? #129

Open GeekAlexis opened 3 years ago

GeekAlexis commented 3 years ago

Thanks for the great project.

As I know, the CRF model only takes real values. If I use the suffix of a word (say last 4 letters) as a feature, is it internally converted to a binary feature with all combinations of the 4 letters (up to 26 x 26 x 26 x 26 dimensional)? I don't see this documented anywhere.

gurmitteotia commented 2 months ago

I know you asked this question quite a long time ago but answer can be still useful to someone.

As per ItemSequence document string features are converted to float. e.g. if you pass a feature as {"word1" : "hello"} then it will be converted to {"word1=hello": 1.0}. Have a look document it has many examples.