shikokuchuo / mirai

mirai - Minimalist Async Evaluation Framework for R
https://shikokuchuo.net/mirai/
GNU General Public License v3.0
193 stars 10 forks source link

Support a character vector in `daemon(cleanup = )` #79

Closed krlmlr closed 1 year ago

krlmlr commented 1 year ago

Using an integer as an array of flags feels non-idiomatic. I understand that this gives the best performance when actually handling the mirais, but would you consider an interface like

daemons(..., cleanup = c("global_env", "unload", "options"), ...)

and translate internally?

Thanks for the great package, I've wanted this functionality for a long time.

shikokuchuo commented 1 year ago

Fantastic @krlmlr! I'm glad the package is useful for you.

The reason it's a bitmask is actually not for performance but due to wanting to make it a bit obscure (the relevant discussion was here: https://github.com/wlandau/crew/issues/65#issue-1658538992).

The goal was really to prevent casual users shooting themselves in the foot with, in particular, resetting options but not packages, which might then break package behaviour. Having said that, I find that I do not have such a strong opinion on this now.

Performance-wise, a set of logical flags would likely be optimal, but perhaps a bit verbose. I try to avoid a choice of character argument if I am going to have to use something like match.arg() rather than handling it in C.

Instead I would probably look to re-implement it as an integer argument - 0 - no cleanup, 1 - clean globalenv, 2 - clean globalenv + reset options + packages, 3 - everything + gc.

daemons(..., clean_level = 0L)

Let me know if you think the above would be better.

krlmlr commented 1 year ago

Thanks! It took me some time to understand why my (intentional) state changes were reverted: the documentation for the cleanup argument is in ?daemon which is only linked from ?dispatcher .

Have you considered @inheritDotParams ? I see how this might be overwhelming, not clear if it's a win here.

I like how you can choose which parts to clean up. (I still think mirai could, in addition or instead, report state changes: https://github.com/shikokuchuo/mirai/issues/80#issuecomment-1742912561.) My remark was about the use of an integer, where the idiomatic way in R would be a character vector. Yes, it's slow to translate, but translation is only needed once, at the front end. I agree that internals should use whatever is fastest.

shikokuchuo commented 1 year ago

Ah I see - yes I am in the middle of improving the docs actually. Thanks for pointing this out - I will at the very least make it clearer what ... params may be specified. I think I still prefer the ... pass through instead of adopting all the parameters at daemons()!

The character vector interface is superior - I am in agreement. The reason performance matters here is that {crew} can spin up very-short-lived mirai as part of auto-scaling.

There is possibly a middle way - which is offer both - have the character vector as default, but allow {crew} to still use the integer mask - the only additional evaluation would be an is.integer() for {crew} which would be negligible.

krlmlr commented 1 year ago

@inheritDotParams is a roxygen2 tag. The idea is to keep the behavior, but roxygen2 adds documentation for all arguments from the other functions. I can show in a PR if you like.

shikokuchuo commented 1 year ago

Got it, that's pretty neat. Thanks!

shikokuchuo commented 1 year ago

On this point, I seem to remember @wlandau you preferred having each of the options independent - 1. clean global env, 2. reset options, 3. reset packages, 4. gc. Rather than lump 2 and 3 together, which would avoid the resetting options potentially breaks packages issue. Am I right?

wlandau commented 1 year ago

Yes, I find it more idiomatic to have 4 separate arguments with syntactic names. The is what crew currently does: https://github.com/wlandau/crew/blob/5acd7fc75aa536f704cf1f278cec6a16057ddd9a/R/crew_controller_local.R#L34-L37. If most users should not be setting these arguments, that seems like a documentation concern to me.

shikokuchuo commented 1 year ago

Sure thanks.

My original idea was to have a few levels that would make sense in a character argument - something like: c("default", "none", "lite", "full"). But that wouldn't do if each is an option.

On my side, I prefer to keep 'cleanup' as a single argument rather than a block of logical arguments. Seems nothing to be done here, bar a better suggestion.

Also @krlmlr appreciate the @inheritdotparams suggestion, it's a very neat trick, but the sheer volume of options did turn out to be overwhelming on balance, so this was reverted.

shikokuchuo commented 1 year ago

Thinking on it further, the integer bitmask is not very friendly to implement an interface against, so I've made a compromise change, which is to have the options as a logical vector argument. Then at least it's trivial to construct.

I've kept it as one argument instead of many as I want to make sure users consider these options in their entirety rather than each one individually.

Thanks again @krlmlr for the suggestion.

krlmlr commented 1 year ago

Code is read more often than it is written. I imagine a logical vector to look like arg = c(TRUE, FALSE, TRUE, TRUE), and this is very difficult to understand with a quick glance unless you're very familiar with what arg does. Compare this with arg = c("some", "more", "options") .

shikokuchuo commented 1 year ago

Thanks, having a fresh pair of eyes is a great help. I'm going to simplify this by collapsing into a single TRUE/FALSE. I estimate that this will cover 95% of cases. I'm retaining the integer bitmask as an alternative where more granular control is required - and the documentation will be improved to make it much clearer how this operates.