unDocUMeantIt / koRpus

An R Package for Text Analysis
GNU General Public License v3.0
45 stars 6 forks source link

Issue on Windows #24

Closed JaySLee closed 4 years ago

JaySLee commented 4 years ago

Hi,

I'm using R and the textstem package in Windows (10, 64-bit), which relies on koRpus. I spotted what appears to be two bugs: 1) In the shell(sys.tt.call) for when it's not unix.OS, the translate = TRUE converts the /'s in the regex substitution in TT.filter.command's "| perl -pe ..." to become backslashes, which breaks perl. The path to TreeTagger by that time contains double backslashes due to normalizePath called earlier, so using translate=FALSE keeps the substition's /'s as is and the path is still fine. 2) The line 'cat(paste(tknz.results, collapse = "\n"), "\n", file = tknz.tempfile)' requires a sep="" or else there's a hidden space appended to the last line/token, resulting in TreeTagger's tagging that last token as .

I am using R from a Cygwin environment, but I believe these errors would occur from a normal Windows environment.

Thanks!

Best, Jay

unDocUMeantIt commented 4 years ago

hi,

thanks for reporting! could you tell me which versions of R, koRpus and TreeTagger you are using?