wlandau / crew

A distributed worker launcher
https://wlandau.github.io/crew/
Other
123 stars 4 forks source link

Feature request: Provide way to pass arguments to the R sessions used by workers #175

Closed rpruim closed 2 months ago

rpruim commented 2 months ago

Recent versions of R support the --max-connections argument. I was assisting a colleague who needed to increase the number of connections available to his workers when using {targerts} and {crew}. I solved his issue by creating a new package {crewargs}, available at https://github.com/rpruim/crewargs. This allows R session arguments to be passed to workers. The README for the package (also a vignette in the package), show a simple example applied to --max-connections.

Before devoting any more time to this package, I'm wondering if this could just be folded into {crew}. It appears to be a relatively simple, non-breaking extension to the current functionality.

wlandau commented 2 months ago

I think this fits nicely with crew itself. Implemented in e62928e1f10fea46b3c6c370afdba208af9bac49. In launcher plugins, launch_worker() can now access user-supplied r_arguments through self$r_arguments or private$.r_arguments. The local launcher now does this.

uhkeller commented 1 month ago

I'm going to use this to work around an issue with local workers loading .RData files which leads to longer startup times, higher memory usage, and unexpected behavior. Using r_arguments = c("--no-save", "--no-restore") will solve the issue I expect. Should this maybe be the default?

Here's a reprex showing the issue:

foo <- "bar"
save.image()
rm(foo)
library(crew)
controller <- crew_controller_local(workers = 1)
controller$start()
controller$push(command = { foo })
controller$wait()
task <- controller$pop()
task$result
#> [[1]]
#> [1] "bar"
controller$terminate()

Created on 2024-08-29 with reprex v2.1.1

I initially discovered this when using {targets}, when a target behaved differently depending on whether deployment was main or worker:

foo <- "bar"
save.image()
library(targets)
tar_script(
  {
    library(crew)
    tar_option_set(
      controller = crew_controller_local(workers = 1)
    )
    list(
      tar_target(bar_worker, foo),
      tar_target(bar_main, try(foo), deployment = "main")
    )
  },
  ask = FALSE
)
tar_make(reporter = "silent")
#> Error in eval(expr = expr, envir = envir) : object 'foo' not found
tar_read(bar_worker)
#> [1] "bar"
tar_read(bar_main)
#> [1] "Error in eval(expr = expr, envir = envir) : object 'foo' not found\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <simpleError in eval(expr = expr, envir = envir): object 'foo' not found>

Created on 2024-08-29 with reprex v2.1.1

wlandau commented 4 weeks ago

Those are good defaults. Added in 24e8fd840c99b55027bcf69ecfbd596a9e6d5182.