Closed Alexis-benoist closed 9 years ago
Hi,
AFAIK crfsuite C++ library can't use partially tagged sequences as a training data. Passing empty tag is the same as passing a non-empty tag. This is fine in many applications, e.g. in NER it is customary to tag "not entity" tokens as "O". But if you don't know the true tag then passing "O" or "" as a tag value doesn't look like a good idea - CRF will learn about this tag.
I think you can either preprocess your data to remove training sequences which don't have full information or use a more versatile CRF implementation. I haven't tried it, and I'm not sure, but pystruct or factorie could be able to do what you want.
Ok, thanks a lot!
Hello,
I'm using CRF suite and some of the data don't have tags.
So I'm giving it
""
for theses, is there a better solution? (because the crf doesn't need to learn about that)Thank you in advance.
Alexis.