Closed yogat3ch closed 9 months ago
I think reset_globals
is working properly. If you set reset_globals = FALSE
, you may still see some globals go away if your worker exits and then relaunches. The aggressive timeout of seconds_idle = 10
may have caused this on your end. Here is what I see:
crew <- crew::crew_controller_local(
name = "user",
workers = 1,
local_log_directory = "crew_logs",
reset_globals = FALSE
)
crew$start()
crew$push(
name = "add_the_data",
command = the_data,
globals = list(the_data = "test")
)
crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA
crew$push(
name = "verify_the_data",
the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] "test"
b$error
#> [1] NA
crew$push(
name = "verify_the_data_again",
the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] "test"
c$error
#> [1] NA
Created on 2024-01-26 with reprex v2.1.0
And with reset_globals = TRUE
, we rightly get the opposite result because the_data
is cleaned up.
crew <- crew::crew_controller_local(
name = "user",
workers = 1,
local_log_directory = "crew_logs",
reset_globals = TRUE
)
crew$start()
crew$push(
name = "add_the_data",
command = the_data,
globals = list(the_data = "test")
)
crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA
crew$push(
name = "verify_the_data",
the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] NA
b$error
#> [1] "object 'the_data' not found"
crew$push(
name = "verify_the_data_again",
the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] NA
c$error
#> [1] "object 'the_data' not found"
Created on 2024-01-26 with reprex v2.1.0
The variable assigned to the value in the internal environment of the package persists between tasks at least once, but it returns NULL by the second task.
Neither crew
nor its backend mirai
can reset values hidden away inside package environments. To be safe in real-life workflows, each task should manually reset these niche variables, or the controller should impose tasks_max = 1
so each task gets a fresh worker.
I think
reset_globals
is working properly. If you setreset_globals = FALSE
, you may still see some globals go away if your worker exits and then relaunches. The aggressive timeout ofseconds_idle = 10
may have caused this on your end. Here is what I see:crew <- crew::crew_controller_local( name = "user", workers = 1, local_log_directory = "crew_logs", reset_globals = FALSE ) crew$start() crew$push( name = "add_the_data", command = the_data, globals = list(the_data = "test") ) crew$wait() a <- crew$pop() a$result[[1]] #> [1] "test" a$error #> [1] NA crew$push( name = "verify_the_data", the_data ) crew$wait() b <- crew$pop() b$result[[1]] #> [1] "test" b$error #> [1] NA crew$push( name = "verify_the_data_again", the_data ) crew$wait() c <- crew$pop() c$result[[1]] #> [1] "test" c$error #> [1] NA
Created on 2024-01-26 with reprex v2.1.0
And with
reset_globals = TRUE
, we rightly get the opposite result becausethe_data
is cleaned up.crew <- crew::crew_controller_local( name = "user", workers = 1, local_log_directory = "crew_logs", reset_globals = TRUE ) crew$start() crew$push( name = "add_the_data", command = the_data, globals = list(the_data = "test") ) crew$wait() a <- crew$pop() a$result[[1]] #> [1] "test" a$error #> [1] NA crew$push( name = "verify_the_data", the_data ) crew$wait() b <- crew$pop() b$result[[1]] #> [1] NA b$error #> [1] "object 'the_data' not found" crew$push( name = "verify_the_data_again", the_data ) crew$wait() c <- crew$pop() c$result[[1]] #> [1] NA c$error #> [1] "object 'the_data' not found"
Created on 2024-01-26 with reprex v2.1.0
Hi @wlandau, Thanks for the response and apologies for the delay, I was out on vacation for the entirety of February. I'm not sure I explained the issue well enough.
Neither crew nor its backend mirai can reset values hidden away inside package environments. To be safe in real-life workflows, each task should manually reset these niche variables, or the controller should impose tasks_max = 1 so each task gets a fresh worker.
The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?
The code written to the package in the reprex above creates an environment when the testPackage
package loads. There's one function in the package that assigns a variable to that environment and recalls the variable from the environment if no new variables are passed into the function.
In the first task, we pass a variable into the crew task which is then assigned by that function to the environment internal to the package. In all but the second subsequent tasks, the variable assigned to that environment is inaccessible.
In your reprex, you pass the variable to each subsequent crew
call, so of course that variable is available in each crew task.
If reset_packages
is FALSE
one would expect that the variable assigned to the package's internal environment in the first task would remain constant throughout all subsequent tasks, but the reprex demonstrates that that's not the case.
Does the issue make more sense now?
The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?
As written originally in https://github.com/wlandau/crew/issues/143#issue-2101209197, your test with packages also uses seconds_idle = 10
. As explained in https://github.com/wlandau/crew/issues/143#issuecomment-1912716885, this will cause the workers to shut themselves down if there is a 10-second period of idling with no tasks. When I removed seconds_idle = 10
in your packages test, every task returned "test"
as expected.
The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?
As written originally in #143 (comment), your test with packages also uses
seconds_idle = 10
. As explained in #143 (comment), this will cause the workers to shut themselves down if there is a 10-second period of idling with no tasks. When I removedseconds_idle = 10
in your packages test, every task returned"test"
as expected.
@wlandau Ahh, the worker is closing out 10s after every task and that's why the packages refresh when loaded each time. Got it, thank you!
Hi
crew
devs, I think I've encountered a bug based on what the documentation suggests forreset_packages
&reset_globals
arguments tocrew_controller_local
. Reprexes belowPrework
crew
package itself and not a user error, known limitation, or issue from another package thatcrew
depends on. For example, if you get errors runningtar_make_clustermq()
, try isolating the problem in a reproducible example that runsclustermq
and notcrew
. And for miscellaneous troubleshooting, please post to discussions instead of issues.Description
Every task allocated to the worker resets the packages and globals, there is nothing that persists between tasks pushed to the worker.
Reproducible example for globals
Expected result
the_data
persists as a global on the worker between subsequent tasks allocated to the worker. However, though it will occasionally persist on one subsequent taskverify_the_data
, by the final task:verify_the_data_again
The result will beNA
because the global has disappeared.The same is true for packages.
Reprex for packages
Expected result
The variable assigned to the value in the internal environment of the package persists between tasks at least once, but it returns
NULL
by the second task.Diagnostic information
Diagnostic information
``` ─ Session info ─────────────────────────────────────────────────────────────── setting value version R version 4.3.0 (2023-04-21) os macOS 14.2.1 system aarch64, darwin20 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/New_York date 2024-01-25 rstudio 2023.12.0+369 Ocean Storm (desktop) pandoc NA ─ Packages ─────────────────────────────────────────────────────────────────── ! package * version date (UTC) lib source P attempt 0.3.1 2020-05-03 [?] CRAN (R 4.3.0) cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0) callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0) VP cli 3.6.1 2023-12-11 [?] CRAN (R 4.3.1) (on disk 3.6.2) P config 0.3.1 2020-12-17 [?] CRAN (R 4.3.0) P crayon 1.5.2 2022-09-29 [?] CRAN (R 4.3.0) crew 0.8.0.9002 2024-01-25 [1] Github (wlandau/crew@f31eea1) P data.table 1.14.10 2023-12-08 [?] CRAN (R 4.3.1) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0) P digest 0.6.33 2023-07-07 [?] CRAN (R 4.3.0) DMDUAPI 2.7.0 2024-01-25 [1] local (/Users/stephenholsenbeck/Documents/R/Contributor_Repos/VirgaLabs/dmdu/dmduapi.tar.gz) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) VP fansi 1.0.4 2023-12-08 [?] CRAN (R 4.3.1) (on disk 1.0.6) fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) P fs 1.6.3 2023-07-20 [?] CRAN (R 4.3.0) P getip 0.1-4 2023-12-10 [?] CRAN (R 4.3.1) VP glue 1.6.2 2024-01-09 [?] CRAN (R 4.3.1) (on disk 1.7.0) P golem 0.4.1 2023-06-05 [?] CRAN (R 4.3.0) P htmltools 0.5.7 2023-11-03 [?] CRAN (R 4.3.1) htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0) P httpuv 1.6.12 2023-10-23 [?] CRAN (R 4.3.1) later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0) VP lifecycle 1.0.3 2023-11-07 [?] CRAN (R 4.3.1) (on disk 1.0.4) P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.3.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0) mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0) P mirai 0.12.0 2024-01-12 [?] CRAN (R 4.3.1) P nanonext 0.12.0 2024-01-10 [?] CRAN (R 4.3.1) P pillar 1.9.0 2023-03-22 [?] CRAN (R 4.3.0) P pkgbuild 1.4.2 2023-06-26 [?] CRAN (R 4.3.0) P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.3.0) P pkgload 1.3.3 2023-09-22 [?] CRAN (R 4.3.1) VP plyr 1.8.8 2023-10-02 [?] CRAN (R 4.3.1) (on disk 1.8.9) prettyunits 1.2.0 2023-09-24 [1] CRAN (R 4.3.1) P processx 3.8.3 2023-12-10 [?] CRAN (R 4.3.1) P profvis 0.3.8 2023-05-02 [?] CRAN (R 4.3.0) promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.0) P ps 1.7.6 2024-01-18 [?] CRAN (R 4.3.1) P purrr 1.0.2 2023-08-10 [?] CRAN (R 4.3.0) P R6 2.5.1 2021-08-19 [?] CRAN (R 4.3.0) P Rcpp 1.0.11 2023-07-06 [?] CRAN (R 4.3.0) P remotes 2.4.2.1 2023-07-18 [?] CRAN (R 4.3.0) P renv 1.0.3 2023-09-19 [?] CRAN (R 4.3.1) VP rlang 1.1.1 2024-01-10 [?] CRAN (R 4.3.1) (on disk 1.1.3) P rstudioapi 0.15.0 2023-07-07 [?] CRAN (R 4.3.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) P shiny 1.8.0 2023-11-17 [?] CRAN (R 4.3.1) VP stringi 1.7.12 2023-12-11 [?] CRAN (R 4.3.1) (on disk 1.8.3) VP stringr 1.5.0 2023-11-14 [?] CRAN (R 4.3.1) (on disk 1.5.1) P tibble 3.2.1 2023-03-20 [?] CRAN (R 4.3.0) tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0) usethis 2.2.2 2023-07-06 [1] CRAN (R 4.3.0) VP utf8 1.2.3 2023-10-22 [?] CRAN (R 4.3.1) (on disk 1.2.4) UU 1.48.0 2024-01-22 [1] Github (yogat3ch/UU@963e49e) P vctrs 0.6.5 2023-12-01 [?] CRAN (R 4.3.1) P virgaUtils 0.5.0 2023-05-13 [?] Github (Martin-McCoy/virgaUtils@3dbd357) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0) P yaml 2.3.7 2023-01-23 [?] CRAN (R 4.3.0) [1] /Users/stephenholsenbeck/Library/Caches/org.R-project.R/R/renv/library/dmdu-4393acb2/R-4.3/aarch64-apple-darwin20 [2] /Users/stephenholsenbeck/Library/Caches/org.R-project.R/R/renv/sandbox/R-4.3/aarch64-apple-darwin20/ac5c2659 V ── Loaded and on-disk version mismatch. P ── Loaded and on-disk path mismatch. ────────────────────────────────────────────────────────────────────────────── ``` GithubSHA: f31eea18c3089f482e6bb3aee0c121956f83ab33 **Latest**