Closed everdark closed 7 years ago
Tokenization functions all call functions from the stringi package, which makes some determinations about word boundaries based on the locale. You might try using stringi::stri_locale_set()
to set the locale on both machines to be the same.
I've looked into this more closely. I can't reproduce the problem on an Ubuntu 16.04 machine, because the locale is the same as on my Mac OS X machine. In any case, I'm sure that the reason for the difference is the locale setting, which stringi picks up. You should use either Sys.setlocale()
or stringi::stri_locale_set()
to ensure that you are using the same locales on all machines.
Hi,
Recently I just came across one issue that make me confused. On my macbook I will have the results:
However on my Ubuntu machine it becomes
If there is any space in between the colon and the other words then both case will be two tokens spearated.
My
sessionInfo
for two machines are as the followings:Am I missing something about this difference? Is it a locale issue? Any help is appreciated.