Closed Franck-Dernoncourt closed 9 years ago
thanks!
Isn't it the same as previous format?
["string_key1", "string_key2", ...] list; that's the same as {"string_key1": 1.0, "string_key2": 1.0, ...}
string_key1=string_value1
is not a special format, it is just a convention on how to create strings.
Hmm good point, I think you're right, sorry about that.
In that case though, in the CoNLL 2002 example, e.g. "word.isupper=True" is one binary feature, "word.isupper=False" is another binary feature --> shouldn't they be merged into same feature? It looks a bit inefficient and more importantly potentially misleading for readers (it led me to believe =
would be parsed).
Hm, maybe you're right, but this is tricky. As you can see in example, positive and negative features didn't get equal weights, e.g.
3.942852 O word.istitle=False
-2.913103 O word.istitle=True
Without word.istitle=False
it won't be possible to assign negative weight to tokens which are non title-cased (because if you multiply anything by 0 you get 0), so this weight will be 'spread' over all other features. Including both features affects regularization (if I'm not mistaken word.istitle
feature will be under-regularized with L2 penalty if there are both word.istitle=False
and word.istitle=True
). It looks like a model with a single feature is different from a model with two features. I don't know what is better though.
@tpeng is it OK to revert this change?
sure! Go ahead
Mikhail Korobov notifications@github.com于Wed, Sep 16, 20159:47 PM写道:
@tpeng https://github.com/tpeng is it OK to revert this change?
— Reply to this email directly or view it on GitHub https://github.com/tpeng/python-crfsuite/pull/24#issuecomment-140867163.
["string_key1=string_value1", "string_key2=string_value2", ...] list is actually the format used in the example http://nbviewer.ipython.org/github/tpeng/python-crfsuite/blob/master/examples/CoNLL%202002.ipynb