Closed lmullen closed 7 years ago
tokenize_skip_ngrams() should work the same was as tokenize_ngrams()
tokenize_skip_ngrams()
tokenize_ngrams()
> tokenize_ngrams(test, n = 2, n_min = 1) [[1]] [1] "one" "one two" "two" "two three" "three" "three four" [7] "four" "four five" "five" "five six" "six" "six seven" [13] "seven" "seven eight" "eight" "eight nine" "nine" "nine ten" [19] "ten"
> tokenize_skip_ngrams(test, n = 2, k = 1) [[1]] [1] "one three" "two four" "three five" "four six" "five seven" "six eight" [7] "seven nine" "eight ten" "one two" "two three" "three four" "four five" [13] "five six" "six seven" "seven eight" "eight nine" "nine ten"
It should preserve the order of the tokens in the documents.
Not going to change this because #24 is the actual problem.
tokenize_skip_ngrams()
should work the same was astokenize_ngrams()
It should preserve the order of the tokens in the documents.