shikokuchuo / mirai

mirai - Minimalist Async Evaluation Framework for R
https://shikokuchuo.net/mirai/
GNU General Public License v3.0
193 stars 10 forks source link

Dispatcher times out with large renv lockfiles #69

Closed alexpiper closed 1 year ago

alexpiper commented 1 year ago

Hi,

I am having an issue with mirai::daemons function, where the dispatcher times out when loading larger renv lockfiles (i.e., fails when lockfile contains 260 packages), but runs successfully with a smaller renv lockfile (110 packages). The function works when renv is deactivated, and also works when dispatcher = FALSE. I first thought this was an issue with targets/crew (See https://github.com/ropensci/targets/discussions/1112#discussioncomment-6776212), but it seems to be related to the mirai::daemons function

This may be fixable by increasing the .timelimit variable?

> packageVersion("mirai")
[1] ‘0.9.1.9007’
> packageVersion("nanonext")
[1] ‘0.9.2.9022’

# Running daemons with dispatcher (Fails)
daemons(1, dispatcher = TRUE)
Error in launch_and_sync_daemon(sock = sock, urld, dots, n, urlc) : 
  sync with dispatcher or daemon failed after 5s

# Running daemons without dispatcher (works)
daemons(1, dispatcher = FALSE)
[1] 1

# Running daemons with dispatcher but renv deactivated (works)
renv::deactivate()
daemons(1, dispatcher = TRUE)
[1] 1

Any help with this would be greatly appreciated

shikokuchuo commented 1 year ago

Thanks for the investigation of the issue and report.

Yes, this is a potential problem with daemons() as there is an initial synchronisation with dispatcher when it is launched.

If R is configured to perform startup tasks that take longer than 5s, then the daemons() call will time out before the new dispatcher process has had the chance to run the code to sync back with host. This could be due to any startup configuration and not specific to renv.

In mirai 0.9.1.9009, daemons() now takes a numeric value for 'dispatcher' as well as TRUE/FALSE to allow setting the sync timeout in milliseconds. If you set it to say 10000L, can you confirm that it now works in your case?

alexpiper commented 1 year ago

Thanks for looking into this, increasing the sync timeout to 10000L fixes the issue!

shikokuchuo commented 1 year ago

As per https://github.com/ropensci/targets/discussions/1112#discussioncomment-6795593 I have re-implemented a solution through the 'dispatcher' argument allowing an NA value for a longer 20s timeout. This avoids the argument having multiple types, and potential trial and error on the user side with different values. 20s should be more than enough, otherwise there is likely something seriously wrong with the R configuration anyway.

shikokuchuo commented 1 year ago

@alexpiper thanks again for reporting this bug. Apologies for changing the solution yet again, but this should be the simplest of all - requiring absolutely no change on your side. The issue is now fixed internally with dispatcher no longer loading any R startup configurations (as it doesn't need them).

If you have time to install the latest development version shikokuchuo/mirai@3529241, it should hopefully now "just work", even in your crew/targets workflows without requiring any updates to those packages.

install.packages("mirai", repos = "shikokuchuo.r-universe.dev")
alexpiper commented 1 year ago

No problem, thanks for the quick fixes!

Confirming that mirai and the targets/crew workflow is working with the latest development version.