wlandau / crew

A distributed worker launcher
https://wlandau.github.io/crew/
Other
129 stars 4 forks source link

crew worker does not actually heed `reset_packages = FALSE`, `reset_globals = FALSE` etc #143

Closed yogat3ch closed 9 months ago

yogat3ch commented 9 months ago

Hi crew devs, I think I've encountered a bug based on what the documentation suggests for reset_packages & reset_globals arguments to crew_controller_local. Reprexes below

Prework

Description

Every task allocated to the worker resets the packages and globals, there is nothing that persists between tasks pushed to the worker.

Reproducible example for globals

# Start the crew
crew <-
  crew::crew_controller_local(
    "user",
    seconds_idle = 10,,
    local_log_directory = "crew_logs",
    reset_globals = FALSE
  )
crew$start()

crew$push(
  name = "add_the_data",
  {
    the_data

  },
  globals = list(the_data = "test")
)
# Pop till this completes
a <- crew$pop()

crew$push(
  name = "verify_the_data",
  the_data
)
# Pop till this completes
a <- crew$pop()

crew$push(
  name = "verify_the_data_again",
  the_data
)
# Pop till this completes
a <- crew$pop()

Expected result

the_data persists as a global on the worker between subsequent tasks allocated to the worker. However, though it will occasionally persist on one subsequent task verify_the_data, by the final task: verify_the_data_again The result will be NA because the global has disappeared.

The same is true for packages.

Reprex for packages

usethis::create_package("../testPackage", open = FALSE)
code <- "my_env <- new.env()
assign_var <- function(value) {
  if (!missing(value))
    my_env$obj <- value
  return(my_env$obj)
}"
write(code, "../testPackage/R/test.R")
devtools::build("../testPackage", "test_package.tar.gz")
devtools::install_local("test_package.tar.gz")
crew <-
  crew::crew_controller_local(
    "user",
    seconds_idle = 10,,
    local_log_directory = "crew_logs",
    reset_globals = FALSE
  )
crew$start()

crew$push(
  name = "assign_value",
  {
    testPackage:::assign_var(value = value)
  },
  data = list(value = "test")
)
# Pop till this completes
a <- crew$pop()

crew$push(
  name = "verify_the_value",
  testPackage:::assign_var()
)
# Pop till this completes
a <- crew$pop()
crew$push(
  name = "verify_the_value_again",
  testPackage:::assign_var()
)
# Pop till this completes
a <- crew$pop()

Expected result

The variable assigned to the value in the internal environment of the package persists between tasks at least once, but it returns NULL by the second task.

Diagnostic information

Diagnostic information ``` ─ Session info ─────────────────────────────────────────────────────────────── setting value version R version 4.3.0 (2023-04-21) os macOS 14.2.1 system aarch64, darwin20 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/New_York date 2024-01-25 rstudio 2023.12.0+369 Ocean Storm (desktop) pandoc NA ─ Packages ─────────────────────────────────────────────────────────────────── ! package * version date (UTC) lib source P attempt 0.3.1 2020-05-03 [?] CRAN (R 4.3.0) cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0) callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0) VP cli 3.6.1 2023-12-11 [?] CRAN (R 4.3.1) (on disk 3.6.2) P config 0.3.1 2020-12-17 [?] CRAN (R 4.3.0) P crayon 1.5.2 2022-09-29 [?] CRAN (R 4.3.0) crew 0.8.0.9002 2024-01-25 [1] Github (wlandau/crew@f31eea1) P data.table 1.14.10 2023-12-08 [?] CRAN (R 4.3.1) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0) P digest 0.6.33 2023-07-07 [?] CRAN (R 4.3.0) DMDUAPI 2.7.0 2024-01-25 [1] local (/Users/stephenholsenbeck/Documents/R/Contributor_Repos/VirgaLabs/dmdu/dmduapi.tar.gz) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) VP fansi 1.0.4 2023-12-08 [?] CRAN (R 4.3.1) (on disk 1.0.6) fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) P fs 1.6.3 2023-07-20 [?] CRAN (R 4.3.0) P getip 0.1-4 2023-12-10 [?] CRAN (R 4.3.1) VP glue 1.6.2 2024-01-09 [?] CRAN (R 4.3.1) (on disk 1.7.0) P golem 0.4.1 2023-06-05 [?] CRAN (R 4.3.0) P htmltools 0.5.7 2023-11-03 [?] CRAN (R 4.3.1) htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0) P httpuv 1.6.12 2023-10-23 [?] CRAN (R 4.3.1) later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0) VP lifecycle 1.0.3 2023-11-07 [?] CRAN (R 4.3.1) (on disk 1.0.4) P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.3.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0) mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0) P mirai 0.12.0 2024-01-12 [?] CRAN (R 4.3.1) P nanonext 0.12.0 2024-01-10 [?] CRAN (R 4.3.1) P pillar 1.9.0 2023-03-22 [?] CRAN (R 4.3.0) P pkgbuild 1.4.2 2023-06-26 [?] CRAN (R 4.3.0) P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.3.0) P pkgload 1.3.3 2023-09-22 [?] CRAN (R 4.3.1) VP plyr 1.8.8 2023-10-02 [?] CRAN (R 4.3.1) (on disk 1.8.9) prettyunits 1.2.0 2023-09-24 [1] CRAN (R 4.3.1) P processx 3.8.3 2023-12-10 [?] CRAN (R 4.3.1) P profvis 0.3.8 2023-05-02 [?] CRAN (R 4.3.0) promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.0) P ps 1.7.6 2024-01-18 [?] CRAN (R 4.3.1) P purrr 1.0.2 2023-08-10 [?] CRAN (R 4.3.0) P R6 2.5.1 2021-08-19 [?] CRAN (R 4.3.0) P Rcpp 1.0.11 2023-07-06 [?] CRAN (R 4.3.0) P remotes 2.4.2.1 2023-07-18 [?] CRAN (R 4.3.0) P renv 1.0.3 2023-09-19 [?] CRAN (R 4.3.1) VP rlang 1.1.1 2024-01-10 [?] CRAN (R 4.3.1) (on disk 1.1.3) P rstudioapi 0.15.0 2023-07-07 [?] CRAN (R 4.3.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) P shiny 1.8.0 2023-11-17 [?] CRAN (R 4.3.1) VP stringi 1.7.12 2023-12-11 [?] CRAN (R 4.3.1) (on disk 1.8.3) VP stringr 1.5.0 2023-11-14 [?] CRAN (R 4.3.1) (on disk 1.5.1) P tibble 3.2.1 2023-03-20 [?] CRAN (R 4.3.0) tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0) usethis 2.2.2 2023-07-06 [1] CRAN (R 4.3.0) VP utf8 1.2.3 2023-10-22 [?] CRAN (R 4.3.1) (on disk 1.2.4) UU 1.48.0 2024-01-22 [1] Github (yogat3ch/UU@963e49e) P vctrs 0.6.5 2023-12-01 [?] CRAN (R 4.3.1) P virgaUtils 0.5.0 2023-05-13 [?] Github (Martin-McCoy/virgaUtils@3dbd357) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0) P yaml 2.3.7 2023-01-23 [?] CRAN (R 4.3.0) [1] /Users/stephenholsenbeck/Library/Caches/org.R-project.R/R/renv/library/dmdu-4393acb2/R-4.3/aarch64-apple-darwin20 [2] /Users/stephenholsenbeck/Library/Caches/org.R-project.R/R/renv/sandbox/R-4.3/aarch64-apple-darwin20/ac5c2659 V ── Loaded and on-disk version mismatch. P ── Loaded and on-disk path mismatch. ────────────────────────────────────────────────────────────────────────────── ``` GithubSHA: f31eea18c3089f482e6bb3aee0c121956f83ab33 **Latest**
wlandau commented 9 months ago

I think reset_globals is working properly. If you set reset_globals = FALSE, you may still see some globals go away if your worker exits and then relaunches. The aggressive timeout of seconds_idle = 10 may have caused this on your end. Here is what I see:

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = FALSE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] "test"
b$error
#> [1] NA

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] "test"
c$error
#> [1] NA

Created on 2024-01-26 with reprex v2.1.0

And with reset_globals = TRUE, we rightly get the opposite result because the_data is cleaned up.

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = TRUE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] NA
b$error
#> [1] "object 'the_data' not found"

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] NA
c$error
#> [1] "object 'the_data' not found"

Created on 2024-01-26 with reprex v2.1.0

wlandau commented 9 months ago

The variable assigned to the value in the internal environment of the package persists between tasks at least once, but it returns NULL by the second task.

Neither crew nor its backend mirai can reset values hidden away inside package environments. To be safe in real-life workflows, each task should manually reset these niche variables, or the controller should impose tasks_max = 1 so each task gets a fresh worker.

yogat3ch commented 8 months ago

I think reset_globals is working properly. If you set reset_globals = FALSE, you may still see some globals go away if your worker exits and then relaunches. The aggressive timeout of seconds_idle = 10 may have caused this on your end. Here is what I see:

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = FALSE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] "test"
b$error
#> [1] NA

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] "test"
c$error
#> [1] NA

Created on 2024-01-26 with reprex v2.1.0

And with reset_globals = TRUE, we rightly get the opposite result because the_data is cleaned up.

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = TRUE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] NA
b$error
#> [1] "object 'the_data' not found"

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] NA
c$error
#> [1] "object 'the_data' not found"

Created on 2024-01-26 with reprex v2.1.0

Hi @wlandau, Thanks for the response and apologies for the delay, I was out on vacation for the entirety of February. I'm not sure I explained the issue well enough.

Neither crew nor its backend mirai can reset values hidden away inside package environments. To be safe in real-life workflows, each task should manually reset these niche variables, or the controller should impose tasks_max = 1 so each task gets a fresh worker.

The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?

The code written to the package in the reprex above creates an environment when the testPackage package loads. There's one function in the package that assigns a variable to that environment and recalls the variable from the environment if no new variables are passed into the function.

In the first task, we pass a variable into the crew task which is then assigned by that function to the environment internal to the package. In all but the second subsequent tasks, the variable assigned to that environment is inaccessible.

In your reprex, you pass the variable to each subsequent crew call, so of course that variable is available in each crew task.

If reset_packages is FALSE one would expect that the variable assigned to the package's internal environment in the first task would remain constant throughout all subsequent tasks, but the reprex demonstrates that that's not the case.

Does the issue make more sense now?

wlandau commented 8 months ago

The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?

As written originally in https://github.com/wlandau/crew/issues/143#issue-2101209197, your test with packages also uses seconds_idle = 10. As explained in https://github.com/wlandau/crew/issues/143#issuecomment-1912716885, this will cause the workers to shut themselves down if there is a 10-second period of idling with no tasks. When I removed seconds_idle = 10 in your packages test, every task returned "test" as expected.

yogat3ch commented 8 months ago

The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?

As written originally in #143 (comment), your test with packages also uses seconds_idle = 10. As explained in #143 (comment), this will cause the workers to shut themselves down if there is a 10-second period of idling with no tasks. When I removed seconds_idle = 10 in your packages test, every task returned "test" as expected.

@wlandau Ahh, the worker is closing out 10s after every task and that's why the packages refresh when loaded each time. Got it, thank you!