scrapinghub / python-crfsuite

A python binding for crfsuite
MIT License
770 stars 221 forks source link

Is it possible to pass no tag during learning? #13

Closed Alexis-benoist closed 9 years ago

Alexis-benoist commented 9 years ago

Hello,

I'm using CRF suite and some of the data don't have tags.

So I'm giving it "" for theses, is there a better solution? (because the crf doesn't need to learn about that)

Thank you in advance.

Alexis.

kmike commented 9 years ago

Hi,

AFAIK crfsuite C++ library can't use partially tagged sequences as a training data. Passing empty tag is the same as passing a non-empty tag. This is fine in many applications, e.g. in NER it is customary to tag "not entity" tokens as "O". But if you don't know the true tag then passing "O" or "" as a tag value doesn't look like a good idea - CRF will learn about this tag.

I think you can either preprocess your data to remove training sequences which don't have full information or use a more versatile CRF implementation. I haven't tried it, and I'm not sure, but pystruct or factorie could be able to do what you want.

Alexis-benoist commented 9 years ago

Ok, thanks a lot!