wrathematics / ngram

Fast n-Gram Tokenization
Other
71 stars 23 forks source link

results by FIFO( First Input First Output) #7

Closed Yongbaek-Kim closed 5 years ago

Yongbaek-Kim commented 5 years ago

txt <- "1 2 3 4 5 6 7 8 9 2 11" get.ngrams(ngram(txt, n = 2)) [1] "7 8" "1 2" "8 9" "9 2" "2 11" "4 5" [7] "3 4" "5 6" "6 7" "2 3"

I ran the above in RStudio 3.5.2 in Win10. I hope the function results in order of input for better debugging. Can it be possible?

wrathematics commented 5 years ago

You can use ngram.order() in the latest on github (thanks @heckendorfc !)

library(ngram)
txt <- "1 2 3 4 5 6 7 8 9 2 11"
ng = ngram(txt)
get.ngrams(ng)[ngram.order(ng)]
##  [1] "1 2"  "2 3"  "3 4"  "4 5"  "5 6"  "6 7"  "7 8"  "8 9"  "9 2"  "2 11"