iit-cs579 / main

CS579: Online Social Network Analysis at the Illinois Institute of Technology
147 stars 204 forks source link

token_features function #466

Closed agericke closed 5 years ago

agericke commented 5 years ago

Does this function ignore case? Or should we take into account?

For example if I have: token_features(['hi', 'there', 'HI'], feats)

Output should be: sorted(feats.items()) [('token=hi', 2), ('token=there', 1)]

Or: sorted(feats.items()) [('token=HI', 1), ('token=hi', 1), ('token=there', 1)]

hding9 commented 5 years ago

If tokens here is generated by tokenize(doc, keep_internal_punct=False) function, I guess it does not matter? Also, one more question, the test case in token_features, does it have to be an array?

aronwc commented 5 years ago

token_features does not need to convert anything to lowercase. This will only be done by tokenize.

It should work whether the input is a list or an array.

On Mar 11, 2019, at 10:53 PM, hding9 notifications@github.com wrote:

If tokens here is generated by tokenize(doc, keep_internal_punct=False) function, I guess it does not matter? Also, one more question, the test case in token_features, does it have to be an array?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.