ropensci / parzer

Parse geographic coordinates
https://docs.ropensci.org/parzer
Other
63 stars 6 forks source link

Fucntions hangs on Windows with non-UTF-8 MBCS locale #31

Closed yutannihilation closed 3 years ago

yutannihilation commented 3 years ago

parzer's functions using std::regex hang (or, to be precise, take a very long time) on Windows. Here's an example:

parzer::parse_lon("10")

It turned out this is due to some bug in GCC, which is unlikely to see the fix in Rtools very soon. Possible workarounds are:

  1. Use boost::regex
  2. Wrap all functions using std::regex with withr::with_locale(new = c(LC_COLLATE = "C"), ...)

Do you think either is acceptable? I'm happy to create a pull request and test with my Windows laptop.

For the details, please refer to the Twitter thread: https://twitter.com/opencpu/status/1353000471062048771

AlbanSagouis commented 3 years ago

That's interesting. The command parzer::parse_lon("10") and the gist you prepared (minimal_rcpp_code_to_freeeze_windows.R) freeze on my Windows laptop. I often use parzer::parse_lon on large character vectors, i find it slow but never that slow, I don't get what's happening here. @sckott preferred to avoid dependence on boost but maybe that changed. I would be curious to try your fix.

yutannihilation commented 3 years ago

I often use parzer::parse_lon on large character vectors, i find it slow but never that slow

Oh, curious. So, maybe it means, when the memory pressure is so high, the allocation for regex cache fails early? I wonder to what extent it speeds up if you set Sys.setlocale("LC_COLLATE", "C").

sckott commented 3 years ago

Thanks for the issue. I don't have a Windows machine, but will see if we can replicate on CI at least.

By the way, I played with replacing Rcpp with cpp11 https://github.com/ropensci/parzer/issues/27 - not done yet though. I wonder if that would solve this problem

yutannihilation commented 3 years ago

will see if we can replicate on CI at least.

I'm afraid we can't. I'm not sure if GitHub Actions' Windows runner has Language Pack of MBCS locale installed.

I wonder if that would solve this problem

No, it won't. As I described above, this is a GCC's bug. cpp11 (or Rcpp) doesn't save us from using std::regex.

yutannihilation commented 3 years ago

I'm afraid we can't.

Sorry, I was wrong. It seems it's reproducible on GitHub Actions CI:

https://github.com/yutannihilation/test-setlocale-non-utf8-mbcs-locale-gha/runs/1769026401?check_suite_focus=true#step:5:1

sckott commented 3 years ago

Glad its reproducible on CI.

I'd rather not use boost as it's such a heavy dependency for folks that install from source. Maybe we try withr first and see how that goes. If it's not satisfactory we can try boost

yutannihilation commented 3 years ago

Thanks, agreed. Then may I try a pull request? I'm not confident with my C++ skill, but if it's R code maybe I can do it, and since I have Windows machine with such a locale, it's easier to test.

sckott commented 3 years ago

Yes, PR please.

yutannihilation commented 3 years ago

Sure, maybe I'll do it tomorrow or this weekend :)

yutannihilation commented 3 years ago

@sckott I made a pull request here: #32. Could you take a look when you have time? Since I implemented it in a bit tricky way, I'm curious about what you think...

sckott commented 3 years ago

thanks, yes i will