Closed agericke closed 5 years ago
If tokens
here is generated by tokenize(doc, keep_internal_punct=False)
function, I guess it does not matter? Also, one more question, the test case in token_features
, does it have to be an array?
token_features does not need to convert anything to lowercase. This will only be done by tokenize.
It should work whether the input is a list or an array.
On Mar 11, 2019, at 10:53 PM, hding9 notifications@github.com wrote:
If tokens here is generated by tokenize(doc, keep_internal_punct=False) function, I guess it does not matter? Also, one more question, the test case in token_features, does it have to be an array?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Does this function ignore case? Or should we take into account?
For example if I have:
token_features(['hi', 'there', 'HI'], feats)
Output should be:
sorted(feats.items())
[('token=hi', 2), ('token=there', 1)]
Or:
sorted(feats.items())
[('token=HI', 1), ('token=hi', 1), ('token=there', 1)]