mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Worker API does not properly document requirements for `common_data` #241

Closed wlandau closed 1 year ago

wlandau commented 3 years ago

I am trying to reconstruct an iteration of the worker API event loop outside targets to diagnose a different issue, and I am running into trouble. What am I doing wrong at "WORKER_ERROR: wrong field names for DO_SETUP"?

options(clustermq.scheduler = "multiprocess")
library(clustermq)
envir <- new.env(parent = emptyenv())
envir$global <- "value"
w <- workers(n_jobs = 1)
w$set_common_data(export = list(global = 123))
#> [1] "aiqzy"
x <- w$receive_data()
w$send_common_data()
x <- w$receive_data()
#> Error in w$receive_data(): 
#> WORKER_ERROR: wrong field names for DO_SETUP:
w$send_call(global)
x <- w$receive_data()
x$result
#> [1] "Error in eval(msg$expr, envir = msg$env) : object 'global' not found\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <simpleError in eval(msg$expr, envir = msg$env): object 'global' not found>

Created on 2021-03-24 by the reprex package (v1.0.0)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os macOS Catalina 10.15.7 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2021-03-24 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2) #> cli 2.3.1 2021-02-23 [1] CRAN (R 4.0.2) #> clustermq * 0.8.95.1 2020-07-13 [1] CRAN (R 4.0.2) #> codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.2) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2) #> debugme 1.1.0 2017-10-22 [1] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2) #> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) #> pillar 1.5.1 2021-03-05 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.2) #> ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2) #> reprex 1.0.0 2021-01-27 [1] CRAN (R 4.0.2) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.3) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.2) #> tibble 3.1.0 2021-02-25 [1] CRAN (R 4.0.3) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2) #> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2) #> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.2) #> xfun 0.21 2021-02-10 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```
mschubert commented 3 years ago

The common data expects the following fields: id, fun, const, export, pkgs, rettype, common_seed, token (also see here)

The worker checks if these fields are present, and will display an error if not.

w$set_common_data() sets some implicitly, but not most.

So you need to provide these arguments. The following will work:

options(clustermq.scheduler = "multiprocess")
library(clustermq)
envir <- new.env(parent = emptyenv())
envir$global <- "value"
w <- workers(n_jobs = 1)
w$set_common_data(fun=identity, const=list(), pkgs=c(), common_seed=123, rettype="list",
                  export = list(global = 123)) # changed
x <- w$receive_data()
w$send_common_data()
x <- w$receive_data()

However, this is not clear from the error so this is a (minor) API (and documentation) bug.

wlandau commented 3 years ago

Thanks, including those extra arguments does work.

I noticed it also worked without including id. Will id be required at some point? If so, I will update targets.

mschubert commented 3 years ago

That was a mistake, id should not be provided. Fixed above, thanks for pointing that out.

mschubert commented 1 year ago

No longer relevant with the v0.9 rewrite because any objects can be added to the worker environment (and none are required)