r-lib / cpp11

cpp11 helps you to interact with R objects using C++ code.
https://cpp11.r-lib.org/
Other
199 stars 46 forks source link

Reproducible segmentation fault (with tidyr and openxlsx), bisect included #244

Closed bersbersbers closed 2 years ago

bersbersbers commented 2 years ago

Open R and copy-and-paste this:

x <- data.frame()
invisible(gctorture2(50))

fun1 <- function() {
    as.data.frame(tidyr:::simplifyPieces(
        list(c("A", "A"), c("A", "B"), c("B", "A"), c("B", "B")), 2, FALSE
    )$strings)
}

fun2 <- function() {
    openxlsx::write.xlsx(as.data.frame(1), tempfile())
}

y <- fun1() # try(fun1) does not repro!
fun2()
while (TRUE) {
    print(fun1())
}

gives

 *** caught segfault ***
address 0x55bb00000004, cause 'memory not mapped'

Traceback:
 1: print.data.frame(fun1())
 2: print(fun1())

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: `

I can repro the problem on five OSes: 1 x OpenSUSE Leap 15.2, 2 x Windows 10, and 2 x Ubuntu 20.04 in Windows 10 WSL.

This issue has a long history with me (https://github.com/tidyverse/tidyr/issues/1163, https://github.com/ycphs/openxlsx/issues/267, tidyverse/ggplot2#4635, rstudio/rmarkdown#2229, thomasp85/patchwork#278).

After bisecting CRAN releases of tidyr, I finally bisected cpp11 from GitHub and found 087861f9ee2383d717255ecb0487e4b297808c49 as the first bad commit. More details below.

Using R -e 'devtools::install_version("cpp11", "0.4.0"); install.packages("tidyr")' && R # repro!
Using R -e 'devtools::install_version("cpp11", "0.3.1"); install.packages("tidyr")' && R # no repro!

Using R -e 'devtools::install(); install.packages("tidyr")' && R
29ce41de04a6cc884f0217009c9e00b8c611d0c4 (master) bad
5a26bdb8e83d96e78c49bdb14024409309825a62 (v0.3.1) bad (must have been a mistake)
7d9195f47b9c653f05ff888eac4a6ef6d3389439 (v0.3.0) good
73e1acb15bcaac6b9c2c72804b1c0e4fdaeab9c0 good
5a26bdb8e83d96e78c49bdb14024409309825a62 good
5a26bdb8e83d96e78c49bdb14024409309825a62 (v0.3.1) good
1c84e03239dd759c8e8f52d185d9acf3c1251041 bad
9df63865765284ded32ab49bb44deb3c48d385f5 good
ac7374e81bb9e8fe35b21424c53e21b91b00c9e4 good
cde1872c8000b79e008420f5923dbfe2eb51e953 bad
087861f9ee2383d717255ecb0487e4b297808c49 bad
b9ec7eb4448b3eeea47e65df917f69034fda926c good
087861f9ee2383d717255ecb0487e4b297808c49 first bad!

I am still not convinced that cpp11 is the culprit, but developers of openxlsx and tidyr, while being able to reproduce the issue, don't find the source in their code base, either. So maybe people here have an idea.

DavisVaughan commented 2 years ago

It is also generally worth noting that I can't seem to reproduce this with an old version of tidyr that used Rcpp, but soon after the switch to cpp11 this starts happening. It has proved to be difficult to isolate though

jimhester commented 2 years ago

I so far have been unable to reproduce this either on macOS or in a Ubuntu Docker container.

The behavior looks like either stack corruption or a protection failure, but it is going to be difficult to track down unless I can reproduce it myself.

bersbersbers commented 2 years ago

I had hoped that an example that fails on Windows and Linux is universal, but it seems not. But I can provider some further information.

Windows is Windows 10 21H2, R 4.1.1 WSL is Ubuntu 20.04, R 4.1.1. Linux is OpenSUSE Leap 15.2

Here's a session info from WSL (before the loop):

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       tidyr_1.1.4      fansi_0.5.0      utf8_1.2.2
 [5] crayon_1.4.1     dplyr_1.0.7      R6_2.5.1         lifecycle_1.0.1
 [9] magrittr_2.0.1   zip_2.2.0        pillar_1.6.4     stringi_1.7.5
[13] rlang_0.4.12     vctrs_0.3.8      generics_0.1.0   ellipsis_0.3.2
[17] openxlsx_4.2.4   tools_4.1.1      glue_1.4.2       purrr_0.3.4
[21] compiler_4.1.1   pkgconfig_2.0.3  tidyselect_1.1.1 tibble_3.1.5

What else could be helpful for me to provide?

DavisVaughan commented 2 years ago

I've managed to reproduce this without tidyr

Open up an R Console and run this and for me it segfaults most of the time (running this in RStudio won't immediately crash it, but it often makes further processing hang and eventually segfault somewhere)

invisible(gctorture2(50))

cpp11::cpp_function(
  '
  cpp11::list fn_cpp() {
    cpp11::writable::list out(4);
    return out;
  }
  '
)

fun2 <- function() {
  openxlsx::write.xlsx(as.data.frame(1), tempfile())
}

y <- fn_cpp()
fun2()

while (TRUE) {
  print(fn_cpp())
}
bersbersbers commented 2 years ago

@DavisVaughan great job. I had to install.packages("decor"), but now I could confirm the segmentation fault on WSL (pasting the code into the R console on the shell) on the first try (not on the second, though). On native Linux, there is not segfault either as far as I have tried (a few times), but I guess the most important point is to have one less package to care about. I wonder when we find/if there is an example without openxlsx.

jimhester commented 2 years ago

Ok, I have finally be able to reproduce this, I think the problem was I was using R 4.1.0 rather than 4.1.1+, and I didn't encounter the crash with that version.

245 resolves this for me, @bersbersbers could you try that with your more full examples and verify it fixes the problem there as well?

DavisVaughan commented 2 years ago

I ran my example above with gctorture2(5) and it seems to be working, so I think that was it?

bersbersbers commented 2 years ago

Test log:

This looks pretty consistent in that installing #245 and reinstalling tidyr fixes the issue until tidyr is rebuilt with another version of cpp11 not including #245. Looks like a weeks-long saga comes to an end, thanks to everyone involved 🚀

bersbersbers commented 2 years ago

Tests using my original code took a bit longer, so here is the condensed log:

Thanks again!