ropensci / tokenizers

Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
184 stars 25 forks source link

Add function to chunk texts into smaller segments #30

Closed lmullen closed 6 years ago

lmullen commented 7 years ago

Suggested by @rccordell. Development in this branch. https://github.com/ropensci/tokenizers/blob/chunk-text/R/chunk-text.R

lmullen commented 7 years ago

This has been merged into master, but some work still needs to be done:

Ironholds commented 7 years ago

LMK if you run into performance implications and I'm happy to try moving it compile-side

lmullen commented 7 years ago

@Ironholds Will do.