r-lib / downlit

Syntax Highlighting and Automatic Linking
https://downlit.r-lib.org
Other
90 stars 22 forks source link

build_site() segfault when ÷ character formatted as code #189

Closed tobyhodges closed 5 months ago

tobyhodges commented 5 months ago

Running pkgdown::build_site() on a package that includes Markdown files with the ÷ character formatted as code triggers a segfault. See my error output below, when I ran the function on a minimal package whose index.md contains:

What happens if I `÷`?
> pkgdown::build_site()
── Installing package divisiontesting into temporary library ─────────────────────────
── Building pkgdown site for package divisiontesting ───────────────────────────
Reading from: /masking/my/path/R/divisiontesting
Writing to: /masking/my/path/R/divisiontesting/docs
── Initialising site ───────────────────────────────────────────────────────────
── Building home ───────────────────────────────────────────────────────────────
Reading index.md
\
 *** caught segfault ***
address 0x0, cause 'invalid permissions'
|
Traceback:
 1: parse(con, keep.source = TRUE, encoding = "UTF-8", srcfile = srcfile)
 2: doTryCatch(return(expr), name, parentenv, handler)
 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 4: tryCatchList(expr, classes, parentenv, handlers)
 5: tryCatch(parse(con, keep.source = TRUE, encoding = "UTF-8", srcfile = srcfile),     error = function(e) NULL)
 6: safe_parse(text)
 7: autolink_url(text)
 8: FUN(X[[i]], ...)
 9: vapply(.x, .f, ..., FUN.VALUE = character(1), USE.NAMES = FALSE)
10: map_chr(text, fun, ...)
11: tweak_children(x, xpath_inline, autolink, replace = "contents")
12: downlit::downlit_html_node(html)
13: tweak_page(html, name, pkg = pkg)
14: render_page(pkg, "title-body", data = list(pagetitle = attr(body,     "title"), body = body, filename = filename, source = repo_source(pkg,     fs::path_rel(filename, pkg$src_path))), path = path)
15: FUN(X[[i]], ...)
16: lapply(mds, render_md, pkg = pkg)
17: build_home_md(pkg)
18: build_home(pkg, override = override, preview = FALSE)
19: build_site_local(pkg = pkg, examples = examples, run_dont_run = run_dont_run,     seed = seed, lazy = lazy, override = override, preview = preview,     devel = devel)
20: pkgdown::build_site(...)
ameters = list()), repo = NULL, development = list(        destination = "dev", mode = "default", version_label = "muted",         in_dev = FALSE), topics = list(name = c(hello.Rd = "hello"),         file_in = "hello.Rd", file_out = "hello.html", alias = list(            hello.Rd = "hello"), funs = list(hello.Rd = "hello()"),         title = c(hello.Rd = "Hello, World!"), rd = list(hello.Rd = list(            list("hello"), "\n", list("hello"), "\n", list("Hello, World!"),             "\n", list("\n", "hello()\n"), "\n", list("\n", "Prints 'Hello, world!'.\n"),             "\n", list("\n", "hello()\n"), "\n")), source = list(            hello.Rd = character(0)), keywords = list(character(0)),         concepts = list(character(0)), internal = FALSE), tutorials = list(        name = character(0), file_out = character(0), title = character(0),         pagetitle = character(0), url = character(0)), vignettes = list(        name = character(0), file_in = character(0), file_out = character(0),         title = character(0), description = character(0), depth = integer(0)),     bs_version = 5L, prefix = "")), examples = base::quote(TRUE),     run_dont_run = base::quote(FALSE), seed = base::quote(1014L),     lazy = base::quote(FALSE), override = base::quote(list()),     install = base::quote(FALSE), preview = base::quote(FALSE),     new_process = base::quote(FALSE), devel = base::quote(FALSE),     cli_colors = base::quote(256L), hyperlinks = base::quote(TRUE),     pkgdown_internet = base::quote(TRUE))
e::quote(list(pkg = list(package = "divisiontesting",     version = "0.1.0", lang = "en", src_path = "/Users/hodges/Documents/R/hacks/divisiontesting",     dst_path = "/Users/hodges/Documents/R/hacks/divisiontesting/docs",     install_metadata = FALSE, desc = <environment>, meta = list(        template = list(bootstrap = 5L)), figures = list(dev = "ragg::agg_png",         dpi = 96L, dev.args = list(), fig.ext = "png", fig.width = 7.29166666666667,         fig.height = NULL, fig.retina = 2L, fig.asp = 0.618046971569839,         bg = NULL, other.parameters = list()), repo = NULL, development = list(        destination = "dev", mode = "default", version_label = "muted",         in_dev = FALSE), topics = list(name = c(hello.Rd = "hello"),         file_in = "hello.Rd", file_out = "hello.html", alias = list(            hello.Rd = "hello"), funs = list(hello.Rd = "hello()"),         title = c(hello.Rd = "Hello, World!"), rd = list(hello.Rd = list(            list("hello"), "\n", list("hello"), "\n", list("Hello, World!"),             "\n", list("\n", "hello()\n"), "\n", list("\n", "Prints 'Hello, world!'.\n"),             "\n", list("\n", "hello()\n"), "\n")), source = list(            hello.Rd = character(0)), keywords = list(character(0)),         concepts = list(character(0)), internal = FALSE), tutorials = list(        name = character(0), file_out = character(0), title = character(0),         pagetitle = character(0), url = character(0)), vignettes = list(        name = character(0), file_in = character(0), file_out = character(0),         title = character(0), description = character(0), depth = integer(0)),     bs_version = 5L, prefix = ""), examples = TRUE, run_dont_run = FALSE,     seed = 1014L, lazy = FALSE, override = list(), install = FALSE,     preview = FALSE, new_process = FALSE, devel = FALSE, cli_colors = 256L,     hyperlinks = TRUE, pkgdown_internet = TRUE)), envir = base::quote(<environment>),     quote = base::quote(TRUE))
23: base::do.call(base::do.call, base::c(base::readRDS("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-fun-c0dc366e5621"),     base::list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv,     quote = TRUE)
24: base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-fun-c0dc366e5621"),     base::list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv,     quote = TRUE), file = "/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-res-c0dc5d8ac72",     compress = FALSE)
25: base::withCallingHandlers({    NULL    base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-fun-c0dc366e5621"),         base::list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv,         quote = TRUE), file = "/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-res-c0dc5d8ac72",         compress = FALSE)    base::flush(base::stdout())    base::flush(base::stderr())    NULL    base::invisible()}, error = function(e) {    {        callr_data <- base::as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        if (FALSE) {            base::assign(".Traceback", base::.traceback(4), envir = callr_data)            utils::dump.frames("__callr_dump__")            base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`,                 envir = callr_data)            base::rm("__callr_dump__", envir = .GlobalEnv)        }        e <- err$process_call(e)        e2 <- err$new_error("error in callr subprocess")        class <- base::class        class(e2) <- base::c("callr_remote_error", class(e2))        e2 <- err$add_trace_back(e2)        cut <- base::which(e2$trace$scope == "global")[1]        if (!base::is.na(cut)) {            e2$trace <- e2$trace[-(1:cut), ]        }        base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-res-c0dc5d8ac72",             ".error"))    }}, interrupt = function(e) {    {        callr_data <- base::as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        if (FALSE) {            base::assign(".Traceback", base::.traceback(4), envir = callr_data)            utils::dump.frames("__callr_dump__")            base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`,                 envir = callr_data)            base::rm("__callr_dump__", envir = .GlobalEnv)        }        e <- err$process_call(e)        e2 <- err$new_error("error in callr subprocess")        class <- base::class        class(e2) <- base::c("callr_remote_error", class(e2))        e2 <- err$add_trace_back(e2)        cut <- base::which(e2$trace$scope == "global")[1]        if (!base::is.na(cut)) {            e2$trace <- e2$trace[-(1:cut), ]        }        base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-res-c0dc5d8ac72",             ".error"))    }}, callr_message = function(e) {    base::try(base::signalCondition(e))})
26: doTryCatch(return(expr), name, parentenv, handler)
27: tryCatchOne(expr, names, parentenv, handlers[[1L]])
28: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
29: doTryCatch(return(expr), name, parentenv, handler)
30: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
31: tryCatchList(expr, classes, parentenv, handlers)
32: base::tryCatch(base::withCallingHandlers({    NULL    base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-fun-c0dc366e5621"),         base::list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv,         quote = TRUE), file = "/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-res-c0dc5d8ac72",         compress = FALSE)    base::flush(base::stdout())    base::flush(base::stderr())    NULL    base::invisible()}, error = function(e) {    {        callr_data <- base::as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        if (FALSE) {            base::assign(".Traceback", base::.traceback(4), envir = callr_data)            utils::dump.frames("__callr_dump__")            base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`,                 envir = callr_data)            base::rm("__callr_dump__", envir = .GlobalEnv)        }        e <- err$process_call(e)        e2 <- err$new_error("error in callr subprocess")        class <- base::class        class(e2) <- base::c("callr_remote_error", class(e2))        e2 <- err$add_trace_back(e2)        cut <- base::which(e2$trace$scope == "global")[1]        if (!base::is.na(cut)) {            e2$trace <- e2$trace[-(1:cut), ]        }        base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-res-c0dc5d8ac72",             ".error"))    }}, interrupt = function(e) {    {        callr_data <- base::as.environment("tools:callr")$`__callr_data__`        err <- callr_data$err        if (FALSE) {            base::assign(".Traceback", base::.traceback(4), envir = callr_data)            utils::dump.frames("__callr_dump__")            base::assign(".Last.dump", .GlobalEnv$`__callr_dump__`,                 envir = callr_data)            base::rm("__callr_dump__", envir = .GlobalEnv)        }        e <- err$process_call(e)        e2 <- err$new_error("error in callr subprocess")        class <- base::class        class(e2) <- base::c("callr_remote_error", class(e2))        e2 <- err$add_trace_back(e2)        cut <- base::which(e2$trace$scope == "global")[1]        if (!base::is.na(cut)) {            e2$trace <- e2$trace[-(1:cut), ]        }        base::saveRDS(base::list("error", e2, e), file = base::paste0("/var/folders/rj/3gf6c_l166qc7fl3z_v4pbxw0000gr/T//RtmpcFL1ML/callr-res-c0dc5d8ac72",             ".error"))    }}, callr_message = function(e) {    base::try(base::signalCondition(e))}), error = function(e) {    NULL    if (FALSE) {        base::try(base::stop(e))    }    else {        base::invisible()    }}, interrupt = function(e) {    NULL    if (FALSE) {        e    }    else {        base::invisible()    }})
An irrecoverable exception occurred. R is aborting now ...

In my investigations so far, I have come across no other characters that trigger the problem. I am unsure whether the problem is with pkgdown, pandoc, or somewhere else, and I am right at the limits of my R debugging abilities (so far!). Any suggestions you can provide for where to look next would be much appreciated, and I would be happy to provide more information from my side if needed.

Output of devtools::session_info():

> devtools::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.0 (2024-04-24)
 os       macOS Sonoma 14.4.1
 system   aarch64, darwin20
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Berlin
 date     2024-05-27
 rstudio  2024.04.1+748 Chocolate Cosmos (desktop)
 pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)

─ Packages ─────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cachem        1.1.0   2024-05-16 [1] CRAN (R 4.4.0)
 callr         3.7.6   2024-03-25 [1] CRAN (R 4.4.0)
 cli           3.6.2   2023-12-11 [1] CRAN (R 4.4.0)
 crayon        1.5.2   2022-09-29 [1] CRAN (R 4.4.0)
 desc          1.4.3   2023-12-10 [1] CRAN (R 4.4.0)
 devtools      2.4.5   2022-10-11 [1] CRAN (R 4.4.0)
 digest        0.6.35  2024-03-11 [1] CRAN (R 4.4.0)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.4.0)
 evaluate      0.23    2023-11-01 [1] CRAN (R 4.4.0)
 fansi         1.0.6   2023-12-08 [1] CRAN (R 4.4.0)
 fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
 fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
 glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.0)
 htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
 htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
 httpuv        1.6.15  2024-03-26 [1] CRAN (R 4.4.0)
 knitr         1.46    2024-04-06 [1] CRAN (R 4.4.0)
 later         1.3.2   2023-12-06 [1] CRAN (R 4.4.0)
 lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.4.0)
 mime          0.12    2021-09-28 [1] CRAN (R 4.4.0)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
 pillar        1.9.0   2023-03-22 [1] CRAN (R 4.4.0)
 pkgbuild      1.4.4   2024-03-17 [1] CRAN (R 4.4.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
 pkgdown       2.0.9   2024-04-18 [1] CRAN (R 4.4.0)
 pkgload       1.3.4   2024-01-16 [1] CRAN (R 4.4.0)
 processx      3.8.4   2024-03-16 [1] CRAN (R 4.4.0)
 profvis       0.3.8   2023-05-02 [1] CRAN (R 4.4.0)
 promises      1.3.0   2024-04-05 [1] CRAN (R 4.4.0)
 ps            1.7.6   2024-01-18 [1] CRAN (R 4.4.0)
 purrr         1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
 Rcpp          1.0.12  2024-01-09 [1] CRAN (R 4.4.0)
 remotes       2.5.0   2024-03-17 [1] CRAN (R 4.4.0)
 rlang         1.1.3   2024-01-10 [1] CRAN (R 4.4.0)
 rmarkdown     2.27    2024-05-17 [1] CRAN (R 4.4.0)
 rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.4.0)
 rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.4.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
 shiny         1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
 stringi       1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
 stringr       1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
 tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.4.0)
 usethis       2.2.3   2024-02-19 [1] CRAN (R 4.4.0)
 utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
 vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
 withr         3.0.0   2024-01-16 [1] CRAN (R 4.4.0)
 xfun          0.44    2024-05-15 [1] CRAN (R 4.4.0)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.4.0)
 yaml          2.3.8   2023-12-11 [1] CRAN (R 4.4.0)

 [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library

────────────────────────────────────────────────────────────────────────────────────
tobyhodges commented 5 months ago

A similar problem on another site helped me discover that the multiplication sign × also triggers the segfault. This made me wonder if it is a problem with the whole Latin-1 Supplement block of Unicode, but testing with other characters in that block (e.g. thorn Þ ) did not provoke the error.

hadley commented 5 months ago

Interestingly this worked just fine for me on R 4.3.2, but when I upgraded to R 4.4.0, I see the same problem as you.

Backtrace from C:

 * frame #0: 0x0000000184750904 libsystem_platform.dylib`_platform_strlen + 4
    frame r-lib/pkgdown#1: 0x00000001009f2954 libR.dylib`Rf_mkChar(name=0x0000000000000000) at envir.c:4076:19 [opt]
    frame r-lib/pkgdown#2: 0x0000000100a463ac libR.dylib`finalizeData at gram.c:0 [opt]
    frame r-lib/pkgdown#3: 0x0000000100a456dc libR.dylib`R_Parse(n=-1, status=0x000000016fdf91ac, srcfile=0x000000010876db68) at gram.c:4215:10 [opt]
    frame r-lib/pkgdown#4: 0x0000000100a45770 libR.dylib`R_ParseConn(con=<unavailable>, n=<unavailable>, status=<unavailable>, srcfile=<unavailable>) at gram.c:4277:12 [opt] [artificial]
    frame r-lib/pkgdown#5: 0x0000000100adca6c libR.dylib`do_parse(call=<unavailable>, op=<unavailable>, args=<unavailable>, env=<unavailable>) at source.c:294:6 [opt]
hadley commented 5 months ago

Simpler reprex 😄

downlit::autolink_url("×")
hadley commented 5 months ago

Moving to downlit since the source of the problem is there, but it's either a bug with R 4.4 or something is wrong with the way I'm parsing the code. A base R reprex is:

text <- "×"
srcfile <- srcfilecopy("test.r", text)

Encoding(text) <- "unknown"
con <- textConnection(text)
parse(con, keep.source = TRUE, encoding = "UTF-8", srcfile = srcfile)
tobyhodges commented 5 months ago

Thanks very much for the quick response and fix @hadley