shikokuchuo / mirai

mirai - Minimalist Async Evaluation Framework for R
https://shikokuchuo.net/mirai/
GNU General Public License v3.0
193 stars 10 forks source link

Support for mirai cancellation #112

Open jcheng5 opened 7 months ago

jcheng5 commented 7 months ago

Hi Charlie, while it was on my mind, I just wanted to jot down some thoughts regarding cancellation, in case it's a feature you're ever interested in pursuing (which would definitely be great for Shiny btw). Probably a lot of the below is not news to you, hope I don't offend you by stating what might be obvious to you--these are just my thoughts.

There are several parts to cancellation: 1) how you invoke it, 2) how it cancels the task, 3) how/when it's reported back to the controlling process.

Invocation

The obvious would be mirai$cancel(). The alternative is to use a cancellation token, where each mirai needs to be handed a cancellation-signaling object during construction. For starting/cancelling a single task, the cancel() method is easier; but for coordinating lots of tasks that are logically part of a single operation, the cancellation token approach is easier.

It might be more convenient to have with(cancellationToken, { ... }), i.e. the ability to install a default global cancellation token.

Cancellation

It's either cooperative (make the user's mirai expression explicitly check for interruption) or non-cooperative (send SIGINT/Ctrl+C). For R, probably the vast majority of the time, non-cooperative is fine? But it seems wrong not to make cooperative possible (or at least the ability for the user to say "during this critical section, don't let me be interrupted").

I guess the "pick reasonable defaults" approach would be for it to be non-cooperative by default and then have a way to override the default behavior to ask for cooperative instead. While the "be explicit" approach would be to not support cancellation unless the mirai() function is explicitly told whether the code is designed for cooperative cancellation, or ready for non-cooperative interruption.

Reporting back

Cancellation could be fire-and-forget (the mirai object acts as if interruption succeeded immediately, even if the task is still actually running), like posix signals.

Or it could join on the task, so the user can know whether it was actually interrupted or if it ran to completion before it had a chance to be interrupted. You could imagine the latter distinction mattering for side-effecty operations--like if you tried to cancel a bank transfer, it's important to know whether the cancellation succeeded. I personally have found in practice this seems to almost never matter--or at least, it's nice to know if the task has actually stopped but the distinction between "it stopped because you asked it to" vs. "it stopped on its own" rarely matters; in the bank transfer example you'd probably want to do a query to determine whether it succeeded or not, anyway.

shikokuchuo commented 3 months ago

@jcheng5 it's taken me a while to get to this. Inspiration oddly came from this thread started by Hadley: https://github.com/ropensci/targets/discussions/1310

nanonext and mirai are currently feature-frozen pre-release. But I've made a start in the dev branch of nanonext: https://github.com/shikokuchuo/nanonext/tree/dev

The implementation is likely to be a non-cooperative model. Due to the limitations of R, it would seem to require having to proactively check for cancellation itself from the main thread - and I don't see this fitting well with mirai's minimalist design ethos if user code needs to be invasively modified to do this. Otherwise there is no way to join it from other threads, apart from sending signals (interruption/termination) which is a somewhat blunt, if effective, instrument.

To give you a flavour, and to show what can be done now in the interim. For the specific use case where a new process is created for each session, instead of stop_mirai(), calling daemons() again will drop the existing connection, and cause the remote process to exit immediately by raising a signal upon itself. The below modifies your original gist: https://gist.github.com/jcheng5/1283baec96c05a65778d931a8b7c7314

library(shiny)
library(mirai)
library(promises)
library(bslib)

ui <- page_fluid(
  input_task_button("go", "Go"),
  actionButton("stop", "Stop"),
  textOutput("out")
)

server <- function(input, output, session) {

  session_id <- nanonext::random(4L)

  # create persistent process, specifying a unique compute profile for each session
  # supply a signal value to 'autoexit' e.g. SIGINT, SIGTERM etc.
  daemons(1, dispatcher = FALSE, autoexit = tools::SIGINT, .compute = session_id)
  task <- ExtendedTask$new(function()
    mirai(
      {
        Sys.sleep(5);
        "hello, world"
      },
      .compute = session_id
    )
  ) |> bind_task_button("go")

  observeEvent(input$go, {
    task$invoke()
  })

  observeEvent(input$stop, {
    # implicitly resets daemons, before re-creating
    daemons(1, dispatcher = FALSE, autoexit = tools::SIGINT, .compute = session_id)
  })

  output$out <- renderText({
    task$result()
  })

  observe({
    updateActionButton(session, "stop", disabled = !identical(task$status(), "running"))
  })
}

shinyApp(ui, server)
jcheng5 commented 3 months ago

Forgive me, I've "paged out" a lot of what I knew about nanonext/mirai's implementation at the time I filed this... but based on what I do remember, sounds good!

Due to the limitations of R, it would seem to require having to proactively check for cancellation itself from the main thread

Actually I guess you could consider R to already have cooperative cancellation at a low level, with R_CheckUserInterrupt (and base::suspendInterrupts(), which I didn't know about before today). So if you send SIGINT then you've basically got the right (cooperative) behavior, and you can send SIGTERM for a more aggressive non-cooperative cancel if necessary. (In Shiny Server we would send SIGINT, and after a few seconds, follow it with SIGTERM if necessary.)

shikokuchuo commented 3 months ago

I appreciate the 'paging out'! Yes, I use both in the nanonext/mirai codebase already so am familiar. My primary concern for mirai is that an 'eval' of the user code can become an 'atomic' operation if it goes straight down to C code that does not call R_CheckUserInterrupt() periodically.

Having said that, there are already calling handlers in place to catch an interrupt. For some of the possible implementations I've considered, there could be the opportunity at this point to differentiate between a cancellation interrupt (sent over the wire) vs. a user interrupt. I'd be fairly relaxed with adding complexity for this as it'd be on the error path rather than the hot path. That way, we could replicate something like the behaviour you describe above and call SIGTERM if SIGINT fails to trigger the (modified) handler.