shikokuchuo / mirai

mirai - Minimalist Async Evaluation Framework for R
https://shikokuchuo.net/mirai/
GNU General Public License v3.0
193 stars 10 forks source link

How to use helper functions? #106

Closed jcheng5 closed 7 months ago

jcheng5 commented 7 months ago

I was expecting the following to work, but it gives the error 'miraiError' chr Error in func_a(): could not find function "func_a"

func_a <- function() {
  100
}

func_b <- function() {
  func_a()
}

library(mirai)
daemons(1)

m <- mirai({
  func_b()
}, func_b = func_b, func_a = func_a)

call_mirai(m)$data

Regardless of whether this particular incantation should work or not, is this a scenario you think of as within scope for mirai? I saw in ?mirai that you have some examples where you do mirai(source(...), local=TRUE), is that what you were envisioning people would do if they have code broken into functions but not in a package?

shikokuchuo commented 7 months ago

Hi Joe, this is the direct result of mirai doing this, as documented in mirai():

The expression '.expr' will be evaluated in a separate R process in a clean environment, which is not the global environment, consisting only of the named objects passed as '...' and/or the list supplied to '.args'.

The point of interest being - not the global environment. Hence it requires that func_a be passed as an argument of func_b, or else func_a explicitly assigned to .GlobalEnv in which case it will be picked up.

The reasoning behind this: (i) lowers the chance of accidental global variables left behind in .GlobalEnv if cleanup is not performed, and (ii) slightly opinionated take that this is better practice.

I do see the benefit to allowing this, and at one point I was erring towards having the ... arguments assigned to .Globalenv, but not the variables passed via .args - so as to be able to differentiate.

Do you think it would be useful to meet this expectation? @wlandau had a preference for the current behaviour for crew users, but they would be more of a niche bunch I guess.

shikokuchuo commented 7 months ago

I do see the benefit to allowing this, and at one point I was erring towards having the ... arguments assigned to .Globalenv, but not the variables passed via .args - so as to be able to differentiate.

I found the original discussion for this: #55 - I raised this myself nearly one year ago.

I've now proceeded to implement this in 647e65d: '...' arguments go to .GlobalEnv in the daemon process, '.args' arguments stay local. Best of both worlds. This is also very unlikely to be breaking (apart from as yet unidentified corner cases).

This means that the more casual usage such as your example above now 'just works'.

I've also updated the examples, as it is now possible to source() script files with or without local = TRUE.

@wlandau please note this change. All crew tests continue to pass, including the manual ones, but please check on your side as well and it may allow a more efficient mapping to your data and globals arguments.

wlandau commented 7 months ago

Would it be possible to add an alternative argument to specify local variables? I feel this is important for efficient memory management in crew.

controller$push() calls mirai::mirai() as below:

task <- mirai::mirai(
  .expr = expr_crew_eval,
  name = name,
  command = command,
  string = string,
  data = data,
  globals = globals,
  seed = seed,
  algorithm = algorithm,
  packages = packages,
  library = library,
  .timeout = .timeout,
  .compute = private$.client$name
)

crew's documentation currently promises users that the data argument is for local variables and only globals are assigned to the global environment.

wlandau commented 7 months ago

From https://github.com/shikokuchuo/mirai/discussions/55#discussioncomment-9059713, it should be enough to switch crew to using .args.

wlandau commented 7 months ago

Switched to .args in https://github.com/wlandau/crew/commit/bb75dd76b5697b9142778d1afe9f826028a9af99. All tests pass before and after using development nanonext/mirai.

HenrikBengtsson commented 7 months ago

Note that when using ... for this, you will not be able to use "globals" with names that are the same as the arguments of mirai::mirai(), i.e.

> names(formals(mirai::mirai))
[1] ".expr"    "..."      ".args"    ".timeout" ".compute"

Maybe something like:

Would it be possible to add an alternative argument to specify local variables?

with .locals and .globals would be better and would avoid the above name-clash problem.

shikokuchuo commented 7 months ago

Thanks Henrik, that’s noted. The ‘…’ interface has been there from day 1 and not about to change. The mitigants are the minimal nature of the interface and the '.' prefixes to the arguments.

The only reason the fix for this issue was so straightforward is because it is completely back-compatible.