ropensci / tokenizers

Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
184 stars 25 forks source link

Using long vectors #21

Closed lmullen closed 8 years ago

lmullen commented 8 years ago

From rOpenSci onboarding:

https://github.com/lmullen/tokenizers/blob/d240cddbb1d91146b3a30b90f6fc25abd6919edf/src/skip_ngrams.cpp#L9: If you want to support long vectors in R, you likely want to return and use R_xlen_t for indexing.

dselivanov commented 8 years ago

IMHO we are safe and should close this issue. Can't imagine the case when it can be useful to use skip_grams > 10