ropensci / tokenizers

Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
184 stars 25 forks source link

keeping punctuation #80

Closed Legallois closed 1 year ago

Legallois commented 2 years ago

Hi

I would like to know if there is a way to keep the punctuation in the ngrams tokenization . I would like to get ngrams like : ". he was " "was an intelligent, gentlemanlike man, " "in his own way." etc.

thanks,

Dominique Legallois