ropensci / parzer

Parse geographic coordinates
https://docs.ropensci.org/parzer
Other
63 stars 6 forks source link

Add workaround for a std::regex's bug on non-UTF8 MBCS locales #32

Closed yutannihilation closed 3 years ago

yutannihilation commented 3 years ago

Fix #31

This pull request fixes #31 by wrapping functions that call std::regex with withr::with_locale().

Implementation

First, I investigate which R functions generated by Rcpp use std::regex, and found all of them do. So, it's probably better to fix at the very near location of these functions. But, since they are automatically generated, we cannot modify RcppExports.R. So, there probably are two choices:

  1. Write wrapper functions one by one (e.g. pz_parzer_lat_safe() for pz_parzer_lat())
  2. Modify function body within R script

While 1. is easy, I'm afraid this is human-error prone (e.g. use the original function without _safe mistakenly, forget to add the wrapper when adding a new Rcpp function, etc). So, I chose option 2 in this pull request. (edit: I got the suggestion to override .Call(). It seems far better than my tricky implementations, so I switched to it)

Testing

The hardest part is how to test this. The problem is, as the nature of this bug, the test case doesn't fail, but hangs forever. So, I think it's safe to call it on an external process, so that we can set timeout.

However, this turned out a bit tricky. To test the function, we need to use pkgload::load_all() (or devtools::load_all()) to load them on the external process. The problem is, on Windows, this invokes recompilation (I don't know why), which makes the tests fail as it doesn't finish within the specified timeout. I tweaked to compile it beforehand with pkgbuild::compile_dll(), but anyway it means we need to compile twice on Windows, so the Windows check is a bit slow (I think this is acceptable).

 ** running tests for arch 'i386' ...

  Running 'testthat.R' [53s]

 OK

** running tests for arch 'x64' ...

  Running 'testthat.R' [68s]

 OK
yutannihilation commented 3 years ago

Thanks to the the power of Twitterverse, a better trick is suggested.

https://twitter.com/hadleywickham/status/1355519666660192259

sckott commented 3 years ago

thanks, having a look

sckott commented 3 years ago

Do we know if Japanese locale is available on the GH Actions Windows builder? https://github.com/ropensci/parzer/pull/32/checks?check_run_id=1797664773

Maybe you can try adding to the gh actions config file to show testthat output

- name: Show testthat output
   if: always()
   run: find check -name 'testthat.Rout*' -exec cat '{}' \; || true
   shell: bash
yutannihilation commented 3 years ago

Yes, it's available. Do you mean I should make GHA fail when the locale is unavailable?

yutannihilation commented 3 years ago

Confirmed no test is skipped.

https://github.com/ropensci/parzer/pull/32/checks?check_run_id=1826691377#step:12:394

sckott commented 3 years ago

Okay, great. And I assume this works for you on your machine?

yutannihilation commented 3 years ago

Yes, it works on my Windows, and I also confirmed this fixes the problem of the original reporter of this issue (for reference, here's the conversation (in Japanese): https://twitter.com/niszet0/status/1357863004847497221).

sckott commented 3 years ago

thanks!

yutannihilation commented 3 years ago

Thanks for merging!