mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
145 stars 26 forks source link

Segfault with purrr 1.0.0 #295

Closed wlandau closed 1 year ago

wlandau commented 1 year ago

Issue

With clustermq 0.8.95.3 and purrr 1.0.0, I ran into a segfault.

> options(clustermq.scheduler = "local")
> fx <- function(x) x * 2
> clustermq::Q(fx, x = seq_len(3), n_jobs = 1)
Running sequentially ('LOCAL') ...

 *** caught segfault ***
address 0x0, cause 'memory not mapped'

Traceback:
 1: purrr_lookup[[rettype]](df, fwrap)
 2: stats::setNames(purrr_lookup[[rettype]](df, fwrap), df$` id `)
 3: work_chunk(df = df, fun = fun, const = const, rettype = rettype,     common_seed = seed, progress = TRUE)
 4: Q_rows(fun = fun, df = df, const = const, export = export, pkgs = pkgs,     seed = seed, memory = memory, template = template, n_jobs = n_jobs,     job_size = job_size, rettype = rettype, fail_on_error = fail_on_error,     workers = workers, log_worker = log_worker, chunk_size = chunk_size,     timeout = timeout, max_calls_worker = max_calls_worker, verbose = verbose)
 5: clustermq::Q(fx, x = seq_len(3), n_jobs = 1)

Downgrading to purrr 0.3.5 fixed it, but I do not think it is a bug in purrr.

Cause

Running this:

options(clustermq.scheduler = "local")
library(clustermq)
debugonce(clustermq:::work_chunk)
fx <- function(x) x * 2
Q(fx, x = seq_len(3), n_jobs = 1)

I entered a debugger to to get to here:

https://github.com/mschubert/clustermq/blob/68c03c85caa888b0b3d573f5332cb9bb7dde5665/R/work_chunk.r#L58

purrr_lookup[[rettype]] looks like this:

function (.l, .f, ...) 
{
    .f <- as_mapper(.f, ...)
    if (is.data.frame(.l)) {
        .l <- as.list(.l)
    }
    .Call(pmap_impl, environment(), ".l", ".f", "list")
}
<bytecode: 0x7f831c694270>
<environment: namespace:purrr>

whereas the actual purrr::pmap installed on my system looks like this:

> purrr::pmap
function (.l, .f, ..., .progress = FALSE) 
{
    pmap_("list", .l, .f, ..., .progress = .progress)
}
<bytecode: 0x7f8303db9c10>
<environment: namespace:purrr>

I suspect the clustermq CRAN binary ships with the copied body of each purrr function from purr_lookup, and if the installed version of purrr disagrees, then things break.

Proposal

I suggest replacing this:

https://github.com/mschubert/clustermq/blob/91b35873ac8603b6d3024be3887b0fb04d8cf09b/R/util.r#L89-L99

with this:

purrr_lookup = list(
    "list" = quote(purrr::pmap),
    "logical" = quote(purrr::pmap_lgl),
    "numeric" = quote(purrr::pmap_dbl),
    "integer" = quote(purrr::pmap_int),
    "character" = quote(purrr::pmap_chr),
    "lgl" = quote(purrr::pmap_lgl),
    "dbl" = quote(purrr::pmap_dbl),
    "int" = quote(purrr::pmap_int),
    "chr" = quote(purrr::pmap_chr)
)

and then instead of this:

https://github.com/mschubert/clustermq/blob/68c03c85caa888b0b3d573f5332cb9bb7dde5665/R/work_chunk.r#L58

maybe this:

re = stats::setNames(eval(expr = purrr_lookup[[rettype]])(df, fwrap),  df$` id `)

I am having trouble installing clustermq from the source on my toolchain, but I could submit a PR anyway if you would like.

mschubert commented 1 year ago

Thanks for flagging this! I ran into a similar issues a couple of days ago.

I'm hoping I can fix this as a side-effect of the (almost complete) 0.9.0 release in the next couple of days.

mschubert commented 1 year ago

Fixed in https://github.com/mschubert/clustermq/commit/80e8f50d5673f1505b819d90da34e5687d2fd8fa and backported to CRAN on release 0.8.95.4