ropensci-review-tools / srr

Software Review Roclets
https://docs.ropensci.org/srr/
Other
4 stars 2 forks source link

`rcpp_parse_rmd` crashes R #30

Closed santikka closed 1 year ago

santikka commented 1 year ago

I was running roxygen2::roxygenize() on my dosearch package with srr roclets which surprisingly crashed my R instance (or sometimes simply gets stuck in a seemingly infinite loop). After a bit of debugging, I managed to track down the cause to the function rcpp_parse_rmd. I also further tested this by removing the README.Rmd file temporarily from the package directory, after which roxygenize() succeeds. The issue does not seem to be related to the readme file itself, as the crash occurs also with the dynamite package.

Strangely this only happens on R version 4.2.1, on version 4.1.3 everything used to work fine.

Session info: ``` r library(srr);library(roxygen2);sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.1 (2022-06-23 ucrt) #> os Windows 10 x64 (build 22000) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_Finland.utf8 #> ctype English_Finland.utf8 #> tz Europe/Helsinki #> date 2022-08-19 #> pandoc 2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.1) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.1) #> evaluate 0.16 2022-08-09 [1] CRAN (R 4.2.1) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.1) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.1) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1) #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.1) #> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.1) #> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.1) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1) #> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.2.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1) #> Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.1) #> rlang 1.0.4 2022-07-12 [1] CRAN (R 4.2.1) #> rmarkdown 2.15 2022-08-16 [1] CRAN (R 4.2.1) #> roxygen2 * 7.2.1 2022-07-18 [1] CRAN (R 4.2.1) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1) #> srr * 0.0.1.176 2022-08-19 [1] Github (ropensci-review-tools/srr@06985f6) #> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.1) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.1) #> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.1) #> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.1) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1) #> xfun 0.32 2022-08-10 [1] CRAN (R 4.2.1) #> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.1) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.1) #> #> [1] C:/Users/Santtu/AppData/Local/R/win-library/4.2 #> [2] C:/Program Files/R/R-4.2.1/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
mpadge commented 1 year ago

@santikka Unfortunately i can't reproduce that at all, either on my local machine or in a clean rocker/tidyverse container. That C++ code is really very simple, and i can't really see any way for it to have a bug. If you keep on experiencing it, could you maybe try to locate what's going wrong in the C++ code? Or, since you're on a windows machine, maybe ensure that your RTools is up to date?

santikka commented 1 year ago

@mpadge Thanks, it does seem like the issue is on my end, I reinstalled R, Rtools and all the relevant packages but the issue persists. I cloned the repo and did a bit of further debugging on the C++ side. It turns out that (at least on my machine) the culprit is rmd::strip_leading_white, or more specifically std::regex_replace. I don't really understand why, but this function can take up to a minute to process a single line of text, and simultaneously R memory consumption shoots up by several gigabytes. I built the srr package without the call to rmd::strip_leading_white(linetxt) in the while-loop, and then everything works fine (of course, assuming no leading whitespace).

mpadge commented 1 year ago

Thanks for helping out! That's an interesting result. I guess that means it's something to do with RTools and the C++std library. No idea what to do about that. @jeroen Do you have any idea about possible misbehaviour of the C++ std library and RTools on R4.2.1?

jeroen commented 1 year ago

This is a known bug, that likely won't be fixed, as std::regex will probably be deprecated: vhttps://gcc.gnu.org/bugzilla/show_bug.cgi?id=9872

The solution is to not use std::regex

E.g. similar issue: https://github.com/tesseract-ocr/tesseract/issues/3830

mpadge commented 1 year ago

Thanks so much @jeroen - i would never have found that bug without your help. Brilliant! @santikka Can you please check that it's fixed now? Thanks!

santikka commented 1 year ago

@mpadge Works great now, thanks!