r-lib / xml2

Bindings to libxml2
https://xml2.r-lib.org/
Other
218 stars 82 forks source link

xml_add_parent produces a segfault in for loop #339

Open AleKoure opened 3 years ago

AleKoure commented 3 years ago

By developing a plumber API with xml2 I fall into the following error under a small stress test. I reproduce a minimal example in my local machine.

The following code chunk produces an error,

library(xml2)

xx <- function() {
  x <- read_xml("<fruits><apple color='red'></apple></fruits>")
  xml_add_parent(x, read_xml("<food></food>"))
  print(as.character(x))
}

for(i in 1:1000)xx()
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<food>\n  <fruits>\n    <apple color=\"red\"/>\n  </fruits>\n</food>\n"

 *** caught segfault ***
address 0x55ff44000000, cause 'memory not mapped'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=el_GR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=el_GR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=el_GR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=el_GR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xml2_1.3.2         plumber_1.1.0.9000

loaded via a namespace (and not attached):
 [1] compiler_4.0.4   magrittr_2.0.1   R6_2.5.0         later_1.2.0     
 [5] promises_1.2.0.1 tools_4.0.4      swagger_3.33.1   Rcpp_1.0.6      
 [9] stringi_1.6.1    jsonlite_1.7.2   webutils_1.1     lifecycle_1.0.0 
[13] rlang_0.4.11    

you can bypass it for example by using xml_add_child and xml_replace instead.

erp31 commented 3 years ago

Hi, I'm also experiencing the problem of R crashing when xml_add_parent is used in combination with other code. As a minimal example it crashes when the code below is run three times. When I originally found the problem I was only calling xml_add_parent once in a script with many other function calls. However, I don't know how to create a minimal example for that I'm afraid.

library(xml2)

# Create XML document
doc <- read_xml("<parent><child1>Hello</child1></parent>")

# Check current elements
children <- xml_children(doc)

new_node <- read_xml('<new_node>New text</new_node>')
xml_add_parent(children, new_node)

# Show that the parent node has been added
doc
#> {xml_document}
#> <parent>
#> [1] <new_node>New text<child1>Hello</child1></new_node>

If I run the above in a loop then it causes R to crash e.g.:

library(xml2)

for (i in 1:3){  
  # Create XML document
  doc <- read_xml("<parent><child1>Hello</child1></parent>")

  # Check current elements
  children <- xml_children(doc)
  #expect_equal(xml_text(children), c("Hello"))

  new_node <- read_xml('<new_node>New text</new_node>')
  xml_add_parent(children, new_node)

  doc

}

reprex produces this:

This reprex appears to crash R. See standard output and standard error for more details.

Standard output and error


*** caught segfault ***
  address 0x5610c8000000, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...

OR this:

This reprex appears to crash R. See standard output and standard error for more details.

Standard output and error

free(): invalid pointer

Thanks to AleKoure for pointing out the workaround and helping me locate which part of my code was crashing my R session.

Created on 2021-06-02 by the reprex package (v2.0.0)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.4 (2021-02-15) #> os CentOS Linux 8 #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz UTC #> date 2021-06-02 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> backports 1.2.1 2020-12-09 [2] CRAN (R 4.0.4) #> cli 2.4.0 2021-04-05 [2] CRAN (R 4.0.4) #> crayon 1.4.1 2021-02-08 [2] CRAN (R 4.0.4) #> digest 0.6.27 2020-10-24 [2] CRAN (R 4.0.4) #> ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.4) #> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.4) #> fansi 0.4.2 2021-01-15 [2] CRAN (R 4.0.4) #> fs 1.5.0 2020-07-31 [2] CRAN (R 4.0.4) #> glue 1.4.2 2020-08-27 [2] CRAN (R 4.0.4) #> highr 0.9 2021-04-16 [2] CRAN (R 4.0.4) #> htmltools 0.5.1.1 2021-01-22 [2] CRAN (R 4.0.4) #> knitr 1.32 2021-04-14 [2] CRAN (R 4.0.4) #> lifecycle 1.0.0 2021-02-15 [2] CRAN (R 4.0.4) #> magrittr 2.0.1 2020-11-17 [2] CRAN (R 4.0.4) #> pillar 1.6.0 2021-04-13 [2] CRAN (R 4.0.4) #> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.4) #> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.4) #> reprex 2.0.0 2021-04-02 [2] CRAN (R 4.0.4) #> rlang 0.4.10 2020-12-30 [2] CRAN (R 4.0.4) #> rmarkdown 2.7 2021-02-19 [2] CRAN (R 4.0.4) #> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.4) #> stringi 1.5.3 2020-09-09 [2] CRAN (R 4.0.4) #> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.4) #> styler 1.4.1 2021-03-30 [2] CRAN (R 4.0.4) #> tibble 3.1.1 2021-04-18 [2] CRAN (R 4.0.4) #> utf8 1.2.1 2021-03-12 [2] CRAN (R 4.0.4) #> vctrs 0.3.7 2021-03-29 [2] CRAN (R 4.0.4) #> withr 2.4.2 2021-04-18 [2] CRAN (R 4.0.4) #> xfun 0.22 2021-03-11 [2] CRAN (R 4.0.4) #> xml2 * 1.3.2 2020-04-23 [2] CRAN (R 4.0.4) #> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.4) #> ```
chainsawriot commented 2 years ago
for (i in 1:500){  
  print(i)
  doc <- xml2::read_xml("<a><b>a</b></a>")
  children <- xml2::xml_children(doc)
  xml2::xml_add_parent(children, xml2::read_xml('<c>d</c>'))
}

On my machine, this one can go to 60 and a segfault is triggered.

The trigger is xml_add_parent. xml_add_child and xml_add_sibling won't trigger the segfault.

alexverse commented 9 months ago

Possible solution is adding .copy = TRUE in xml_replace() makes the function stable for iterations, but I guess this will have some impact on performance.