wlandau / crew

A distributed worker launcher
https://wlandau.github.io/crew/
Other
123 stars 4 forks source link

[macOS] Some errors with running tests #163

Closed barracuda156 closed 5 months ago

barracuda156 commented 5 months ago

(I am a maintainer of R packages in Macports, including R-crew. There is no claim from my side that issues below are necessarily a result of bugs in crew; it is possible that the set-up has some issues on our side. But an advice gonna be appreciated.)

Using crew 0.9.1 with R 4.3.3 and Macports build env.

There are two issues. The first one is with R-targets, but the log points specifically at crew:

── Failed tests ────────────────────────────────────────────────────────────────
Error ('test-tar_make.R:47:3'): tar_make() works with crew
<crew_error/crew/tar_condition_run/tar_condition_targets/rlang_error/error/condition>
Error: Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
    {crew} worker 1 launched 5 times in a row without completing any tasks. Either troubleshoot or raise launch_max above 5. Details: https://wlandau.github.io/crew/articles/risks.html#crashes
Last error traceback:
    stop(x)
    .handleSimpleError(function (condition)  {     state$error <- build_mess...
    h(simpleError(msg, call))
Backtrace:
    ▆
 1. └─targets::tar_make(...)
 2.   └─targets:::callr_outer(...)
 3.     ├─targets:::if_any(...)
 4.     └─targets:::callr_error(traced_condition = out, fun = fun)
 5.       └─targets::tar_throw_run(message, class = class(traced_condition$condition))
 6.         └─targets::tar_error(...)
 7.           └─rlang::abort(message = message, class = class, call = tar_empty_envir)

This is the only error in R-targets test suite I get ([ FAIL 1 | WARN 0 | SKIP 512 | PASS 2940 ]).

Having encountered this, I then ran crew own test suite, and got two errors (one in tests and one in vignettes):

R version 4.3.3 (2024-02-29) -- "Angel Food Cake"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.0.0d2 (32-bit)

> library(testthat)
> library(crew)

Attaching package: 'crew'

The following object is masked from 'package:testthat':

    matches

> 
> test_check("crew")
[ FAIL 1 | WARN 0 | SKIP 85 | PASS 218 ]

══ Skipped tests (85) ══════════════════════════════════════════════════════════
• On CRAN (85): 'test-crew_async.R:7:3', 'test-crew_async.R:18:3',
  'test-crew_async.R:42:3', 'test-crew_client.R:22:3',
  'test-crew_client.R:107:3', 'test-crew_controller.R:2:3',
  'test-crew_controller.R:14:3', 'test-crew_controller.R:44:3',
  'test-crew_controller.R:80:3', 'test-crew_controller.R:100:3',
  'test-crew_controller.R:147:3', 'test-crew_controller.R:229:3',
  'test-crew_controller.R:270:3', 'test-crew_controller.R:287:3',
  'test-crew_controller.R:308:3', 'test-crew_controller.R:342:3',
  'test-crew_controller.R:370:3', 'test-crew_controller.R:403:3',
  'test-crew_controller_group.R:19:3', 'test-crew_controller_group.R:150:3',
  'test-crew_controller_group.R:169:3', 'test-crew_controller_group.R:191:3',
  'test-crew_controller_group.R:220:3', 'test-crew_controller_group.R:266:3',
  'test-crew_controller_group.R:303:3', 'test-crew_controller_group.R:340:3',
  'test-crew_controller_group.R:370:3', 'test-crew_controller_group.R:400:3',
  'test-crew_controller_group.R:446:3', 'test-crew_controller_group.R:487:3',
  'test-crew_controller_group.R:515:3', 'test-crew_controller_group.R:536:3',
  'test-crew_controller_group.R:585:3', 'test-crew_controller_group.R:631:3',
  'test-crew_controller_group.R:677:3', 'test-crew_controller_local.R:135:3',
  'test-crew_controller_local.R:201:3', 'test-crew_controller_local.R:245:3',
  'test-crew_controller_local.R:263:3', 'test-crew_controller_local.R:284:3',
  'test-crew_controller_local.R:329:3', 'test-crew_eval.R:37:3',
  'test-crew_eval.R:43:3', 'test-crew_eval.R:49:3', 'test-crew_eval.R:62:3',
  'test-crew_eval.R:70:3', 'test-crew_eval.R:98:3', 'test-crew_eval.R:126:3',
  'test-crew_eval.R:149:3', 'test-crew_eval.R:171:3',
  'test-crew_launcher.R:2:3', 'test-crew_launcher.R:8:3',
  'test-crew_launcher.R:24:3', 'test-crew_launcher.R:37:3',
  'test-crew_launcher.R:138:3', 'test-crew_launcher.R:170:3',
  'test-crew_launcher.R:206:3', 'test-crew_launcher.R:244:3',
  'test-crew_launcher.R:271:3', 'test-crew_launcher.R:306:3',
  'test-crew_launcher.R:335:3', 'test-crew_launcher.R:355:3',
  'test-crew_launcher_local.R:12:3', 'test-crew_launcher_local.R:27:3',
  'test-crew_launcher_local.R:41:3', 'test-crew_launcher_local.R:55:3',
  'test-crew_launcher_local.R:120:3', 'test-crew_launcher_local.R:178:3',
  'test-crew_launcher_local.R:249:3', 'test-crew_launcher_local.R:284:3',
  'test-crew_monitor_local.R:2:3', 'test-crew_monitor_local.R:26:3',
  'test-crew_monitor_local.R:42:3', 'test-crew_retry.R:2:3',
  'test-crew_retry.R:16:3', 'test-crew_retry.R:29:3', 'test-crew_retry.R:48:3',
  'test-crew_retry.R:67:3', 'test-crew_terminate_process.R:2:3',
  'test-crew_worker.R:2:3', 'test-plugins.R:2:3', 'test-plugins.R:107:3',
  'test-plugins.R:225:3', 'test-utils_files.R:2:3', 'test-utils_packages.R:2:3'

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-crew_controller_local.R:62:3'): crew_controller_local() ────────
<crew_expire/crew_error/crew/rlang_error/error/condition>
Error: timed out after retrying for 10 seconds. 
Backtrace:
    ▆
 1. └─crew::crew_retry(...) at test-crew_controller_local.R:62:3
 2.   └─crew:::crew_expire(message)
 3.     └─crew:::crew_stop(...)
 4.       └─rlang::abort(message = message, class = class, call = emptyenv())

[ FAIL 1 | WARN 0 | SKIP 85 | PASS 218 ]
Error: Test failures
Execution halted
Errors in running code in vignettes:
when running code in ‘groups.Rmd’
  ...

> group$start()

> group$push(name = "my task", command = sqrt(4), controller = "semi-persistent")

> group$wait(controllers = "semi-persistent")

  When sourcing ‘groups.R’:
Error: {crew} worker 1 launched 5 times in a row without completing any tasks. Either troubleshoot or raise launch_max above 5. Details: https://wlandau.github.io/crew/articles/risks.html#crashes
Execution halted
when running code in ‘introduction.Rmd’
  ...
> task <- controller$pop()

> task
NULL

> controller$wait(mode = "all")

  When sourcing ‘introduction.R’:
Error: {crew} worker 1 launched 5 times in a row without completing any tasks. Either troubleshoot or raise launch_max above 5. Details: https://wlandau.github.io/crew/articles/risks.html#crashes
Execution halted
when running code in ‘plugins.Rmd’
  ...
> controller$start()

> controller$push(name = "get worker IP address and process ID", 
+     command = paste(getip::getip(type = "local"), ps::ps_pid()))

> controller$wait()

  When sourcing ‘plugins.R’:
Error: {crew} worker 1 launched 5 times in a row without completing any tasks. Either troubleshoot or raise launch_max above 5. Details: https://wlandau.github.io/crew/articles/risks.html#crashes
Execution halted

Overall test results appear acceptable: [ FAIL 1 | WARN 0 | SKIP 85 | PASS 218 ].

@wlandau Could you suggest what may be causing this?