Open napsternxg opened 8 years ago
@kmike and @tpeng do you want to have a look at it?
Using word embeddings improve accuracy a lot. Having a supported way to include them in python-crfsuite would be wonderful.
@napsternxg any updates on feeding float vectors as features? i have the same situation where i want to use glove embeddings for a NER task using crf.
@muhnash0 I basically did the proposed approach in my comment manually. It was quite easy.
I don't think the proposed approach will work. CRFsuite does not support continuous features so each unique key/value combination will be a unique feature. You have to discretize the continuous features with a technique like https://arxiv.org/abs/1711.01068
@DomHudson crfsuite does support continuous features
The approach I suggested is utilized in this tool I have built.
The current API doesn't support adding features which are list of floats e.g. Word Embeddings. The current approach to add these features is to do something like
{"f0": 1.5, "f1": 1.6, "f2": -1.4}
for 3 dimensional embedding features, which adds extra burden on the user's part.I propose a wrapper feature which will allow users to pass the word embedding list as the value of the dictionary. E.g.
{"f": FloatFeatures([1.5, 1.6, -1.4])}
, internally this will convert the float features into a representation consistent with the CRFSuite ItemSequence and having a consistent naming convention like"f:0", "f:1", "f:2"
.