Closed schloerke closed 4 years ago
Thanks Barret for posting to Gihub. I'd appreciate any advice on hosting a plumber API on a windows server (based on my most recent post above) where it can leverage all available cores to allow parallel calls. Thanks for your help!
When running your example script within a machine learning (ML) workflow, the “status” endpoint will not respond immediately when called after 1-2 trys, because R is busy running the ML code after the 10 second delay. I think this can be fixed if I moved my API into a more scalable hosting environment (instead of running just a single instance on one core).
I've found that staying within R as much as possible works the smoothest.
I would look into using callr::r_bg
. Assuming your ML process is in R, callr
can launch a background process. If you save the result of proc <- callr::r_bg(.....)
, you can inspect it to see if the process has finished (proc$is_alive()
) and get its result (proc$get_result()
). The result could be a file or the actual R data returned from the model. I suggest writing the results to disk within the background process to make retrieval possible even if the R process dies.
Have a background R process for each new job could be a blessing and a curse as you will launch an independent R process for each new job. This can be bad if you launch too many processes. However, you could add queueing logic into your API to prevent machine overload.
Do you have any advice on how to implement the R-Based wrapper for Windows?
@shrektan will have more advice here, but I strongly recommend running your plumber instance within Docker. While plumber may work on windows, we do not actively support it.
Hosting plumber on Windows by using the R session directly is OK. But need to remember that fork
is not supported on Windows. So for parallel computing, R objects copying is inevitable. As of scaling-up, I'm certain there're ways to do but it's beyond my knowledge and for me, it's limited in "async/parallel computing on the local machine".
Docker is natively supported by Windows 10. It may be a better option because Docker containers are easier to be managed and scaled. In addition, fork
is supported on Linux. With all these matured tools in the Docker ecosystem, I think it's easier to find whatever you need. Moreover, it's important to be able to duplicate the deployment's environment, which is not difficult on Windows while Docker is born for this.
I don't have enough experience personally with callr
and plumber
in long-running processes, so this might be just a matter of "test it and see".
Using callr::r_bg
seems reasonable. What happens if/when the main R process is interrupted?
Use-case: if an API endpoint accepts some unique ID as a parameter (for which only one execution should occur at a time), then we can store that id within your work_queue
list. Subsequent calls with the same ID can then self-determine that they don't need to start a new one, optionally redirecting to a "status" end point as you've already demonstrated above.
Then ... either the main R process crashes or is restarted (e.g., updated deployment), my guess is that the child processes would either be interrupted (killed) or orphaned (output goes nowhere).
I'd think that the "orphaned" outcome is a matter of what the bg process does ... if it works in side-effect (e.g., insert data into a db), then it will likely do its thing but nothing is notified on exit unless/until somebody checks to see if the side-effect is done (e.g., queries the db). However, if in the meantime another caller tries to start this API with that same ID, then ... it is started again.
Any thoughts to external IPC? I can see utility in filesystem or nosql (redis?) based operations, where we might still be able to use callr::r_bg
but its information is actually stored elsewhere ... in which case, it might be possible to different endpoints to all be able to "know" about that process.
Thoughts?
I believe the original intent was to turn a long execution process into something that can be inspected for a status and result. I believe the trick is to offload the work to someplace other than the main R thread.
There are definitely many different approaches and considerations to be aware of when offloading processing to somewhere other than the main R thread. (Similar to the communication issues that can occur with distributed databases as compared to accessing a data.frame()
.)
Going to close the issue for now as plumber
will not implement any solution that generally solves this.
Copying an email thread here to continue discussion
@mftokic - Oct 7, 2019
@schloerke
@mftokic
@schloerke
`tokic.R`
```r # run in R session: plumber::plumb("tokic.R")$run(port = 12345) # visit in browser: 127.0.0.1:12345/begin work_queue <- list() #' @get /begin function(req, res) { # get unique id while( (id <- paste0(sample(letters, 8, replace = TRUE), collapse = "")) %in% names(work_queue) ) { TRUE } # initiate work in separate thread work_queue[id] <<- list(NULL) later::later( function() { idx <- sample(1:3, 1) work_queue[[id]] <<- list(iris, mtcars, Titanic)[[idx]] }, 10 # wait 10 seconds ) # redirect to status res$status <- 202 res$setHeader("Location", paste0("/status/", id)) # res$setHeader("retry-after", 2) # didn't know if it was seconds or milliseconds id } #' Poll the work queue #' #' @html #' @get /status/@mftokic