text <- scan("ca10.txt", what = "char", sep = "\n") # ca10.txt is a file in the Brown corpus
text <- tolower(text)
text <- gsub("[^a-z- ]", "", text, perl = T)
quad <- get.phrasetable(ngram(text, n = 4))
This last line croaks the error msg. I don't understand why it says nwords=3 which is obviously untrue. Guess it's because one line in the file contains only three tokens? How can I work around this issue? (BTW, I work with R 3.6.3 on Linux Mint 19.3.)
ca10.txt
text <- scan("ca10.txt", what = "char", sep = "\n") # ca10.txt is a file in the Brown corpus text <- tolower(text) text <- gsub("[^a-z- ]", "", text, perl = T) quad <- get.phrasetable(ngram(text, n = 4))
This last line croaks the error msg. I don't understand why it says nwords=3 which is obviously untrue. Guess it's because one line in the file contains only three tokens? How can I work around this issue? (BTW, I work with R 3.6.3 on Linux Mint 19.3.) ca10.txt