Open gagolews opened 8 years ago
Now the behavior is incorrect:
[gagolews@zeus tmp]$ LC_ALL="pl_PL.iso-8859-2" R
R Under development (unstable) (2016-04-14 r70486) -- "Unsuffered Consequences"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> library("stringi")
> x <- stri_conv("a\u0105bc", "UTF-8", "")
> library(re2r)
> re2_match("\u0105", x)
[1] FALSE
> re2_match(x, "\u0105")
B��D: invalid UTF-8 in regexp:
> stri_extract_all_regex(x, "\u0105") # this is OK
[[1]]
[1] "�"
consider converting all input strings to utf8, preferably with `stringi::stri_enc_toutf8``
e.g., Windows does not have a UTF-8 locale set by default