rstudio / promises

A promise library for R
https://rstudio.github.io/promises
Other
201 stars 19 forks source link

Speed bump for rapid `future_promise()` submissions with limited workers #78

Closed schloerke closed 2 years ago

schloerke commented 2 years ago

Found while debugging https://github.com/rstudio/shiny/issues/2196

Reprex app:

library(shiny)
library(future)
# Use `promises::future_promise()` to allow other shiny users to be able to compute.
# Recommended to always use `promises::future_promise()` instead of `future::future()`
library(promises)
# Pick a plan that makes sense for your use case.
# For this example, using `multisession` and max workers = 3
future::plan("multisession", workers = 3)

# Temp work around: https://github.com/rstudio/shiny/issues/3574
options(shiny.deepstacktrace = FALSE)

ui <- fluidPage(
  sliderInput("slider", "Number of items", min = 100, max = 500, value = 500),
  plotOutput("plot")
)

server <- function(input, output, session) {
  output$plot <- renderPlot({
    req(input$slider)

    progress <- Progress$new(session)

    progress$set(message = 'Calculation in progress.')

    # Retrieve `input$slider` outside of `future`
    start <- Sys.time()
    prep <- NULL
    n <- input$slider
    seq_len(n) %>%
      lapply(function(i) {
        promises::future_promise({
          # Do heavy computations in a `future` worker
          data.frame(i = i, pid = Sys.getpid())
        }) %>%
          promises::finally(function() {
            if (is.null(prep)) {
              prep <<- Sys.time()
              message("Prep time: ", prep - start)
            }
            # Update progress bar in the main R session
            progress$inc(1/n, detail = paste('Computating...', i))
          })
      }) %>%
      # Wait for all promises to finish. Return single promise.
      promises::promise_all(.list = .) %>%
      promises::then(function(results) {
        # Extract result and plot in main R session
        plot(do.call(rbind, results))
      }) %>% promises::finally(function() {
        # Close the progress bar.
        # (An `on.exit()` would occur too early)
        progress$close()
        end <- Sys.time()
        message("Total time: ", end - start)
      })
  })
}

shinyApp(ui, server)
> # With PR
> shiny::runApp("~/parallel_progress/")
ℹ Loading promises

Listening on http://127.0.0.1:3598
Prep time: 1.88346886634827
Total time: 19.9987919330597
^C

> # With `main` branch
> shiny::runApp("~/parallel_progress/")
ℹ Loading promises

Listening on http://127.0.0.1:3598
Prep time: 3.13909196853638
Total time: 21.1707081794739
^C

In the demo app, 500 future_promise() submissions are created. Once the first 3 jobs have been submitted, the remaining 497 jobs are attempting to ask "Am I able to proceed?" / "Are there any future workers available?".

Once the workers run out, the answer should stay the same for the remainder of the computation tick. This PR reduces that question count by 496 (497 - 1).

The value will reset on the next tick.

While faster in total computation, I believe it could be even faster. I'd like to know what the current loop iteration is, i.e. loop tick #21. So that if the loop has a different tick number we could use that information, rather than relying on later::later(delay = 0) to execute. But if the reset is being done in the beginning of the work queue, then it might be fine.

wch commented 2 years ago

I think that "tick" isn't the right way to describe how {later} works. In Javascript, the word "tick" does make sense, because the event loop runs all the callbacks which have come due in the tick, then it releases control to the browser to run whatever non-JS browser stuff that needs to be run, and then it goes back to executing the event loop -- that's the next tick. If, in the first run of the event loop, any callbacks are scheduled with setTimeout(fn, 0), then those callbacks will run in the next tick.

With {later}, once it starts executing callbacks in the event loop, it just keeps executing them as long as there are any whose time has come due. It doesn't grab the set of currently-due callbacks, execute them, release control, and then start executing again.