stefan-m-lenz / JuliaConnectoR

A functionally oriented interface for calling Julia from R
Other
100 stars 6 forks source link

Use preloaded Julia packages into future_promise #9

Closed bakaburg1 closed 3 years ago

bakaburg1 commented 3 years ago

Hello,

We are building a shiny dashboard that uses Julia for some heavy lifting. We wanted to create even less blocking (especially for multiuser experience) by running the Julia code into a future using the future_promise() function. The idea is to load the packages at the dashboard startup and then just run them into the future. Here some dummy code:

library(JuliaConnectoR)
library(future)
library(promises)
library(dplyr)

plan(multisession)

JuliaConnectoR:::stopJulia()

Sys.setenv(JULIA_NUM_THREADS = parallel::detectCores())

Base <- juliaImport('Base')
DF <- juliaImport('DataFrames')

onStop(function() JuliaConnectoR:::killJulia())

ui <- fluidPage(

    # App title ----
    titlePanel("Hello Shiny!"),
    actionButton("go", "Go!"),
    tableOutput("table")
)

server <- function(input, output) {

    data <- reactiveVal()

    observeEvent(input$go, {
        print('Triggered')

        future_promise({
            library(dplyr)
            start = Sys.time()

            res <- DF$DataFrame(a = Base$rand(), b = Base$rand()) %>% as.data.frame()

            list(
                start = start,
                res = res
            )
        }, globals = list(Base = Base, DF = DF)) %...>% {
            print('Finished')
            print(Sys.time() - .$start)
            data(.$res)
        }
    })

    output$table <- renderTable({
        data()
    })

}

shinyApp(ui, server)

Unfortunately Julia is starting again in the future and then returns an error: DataFrames not defined. I can overcome the errors by running DF <- juliaImport('DataFrames') inside the future, but then the extra overhead of Julia starting and package loading makes the future useless. Is there a way to have one main Julia session and objects and make them available in the external R session created by the future?

bakaburg1 commented 3 years ago

An alternative I was thinking about is to use Julia async capabilities but I wasn't able to have shiny react to changes in the status of scheduled tasks.

stefan-m-lenz commented 3 years ago

That is an interesting problem. The future_promise starts a new R process, which in turn starts a new Julia process. It is possible with the JuliaConnectoR to connect to an already running Julia server. This can be done via specifying the running Julia server via the environment variable JULIACONNECTOR_SERVER, e.g.

Sys.setenv(JULIACONNECTOR_SERVER="localhost:11981")

The server can be started as in https://github.com/stefan-m-lenz/JuliaConnectoR/blob/master/inst/Julia/main.jl, and with the additional argument keeprunning = true for the RConnector.serve function.

However, the Julia server that comes with the JuliaConnectoR is currently not fully equipped to handle multiple R clients. It might work to connect to one Julia process with multiple R clients, but there are some global variables, which can make problems. With some changes it could be possible to achieve that. It probably suffices to remove the global variables in https://github.com/stefan-m-lenz/JuliaConnectoR/blob/master/inst/Julia/sharing.jl and put them into to CommunicatoR object.

bakaburg1 commented 3 years ago

Thank you! I'll try it out and let you know. In the meantime, any idea on how to make a julia async task listenable by a shiny observer? that would also solve the problem and maybe is less complex. (it also seems like julia async tasks are faster to start than promises)

bakaburg1 commented 3 years ago

Simply forcing the server location via R triggers an error: image

I am not sure how should I interact with the julia code in the files you referenced. Can I do something from the R session directly or I need to edit the package?

bakaburg1 commented 3 years ago

Ok, I saw that all exported functions in your package use JuliaConnectoR:::ensureJuliaConnection() to connect to the server. This function check if the global pkgLocal$con is not null before setting up the connection. I tried to modify the global directly but I get the following error: image

Would the problem be solved by adding a method to pass existing connection and port between R session to fill the new pkgLocal env? I'll try editing the package.

stefan-m-lenz commented 3 years ago

You can't share the connection object between different processes. You can't solve the problem on the R side. You need to make the Julia part multiclient ready. There is already the foundation for this. In order to use the JULIACONNECTOR_SERVER variable, you need to start the Julia process with the argument keeprunning = true. Otherwise the Julia process is terminated after the connection to the process is closed. You can look into the main.jl file how the Julia server is started and you need to add the argument keeprunningto the call to RConnector.serve. Afterwards you can check what you need to change in Julia. The problem with this is that you then need to make sure that every access to the global scope is thread safe.

stefan-m-lenz commented 3 years ago

I thought about going the last steps for making the JuliaConnectoR Julia server multiclient ready. But I don't think this is really a promising way and there is no real use case besides faster testing and easier debugging, for which I implemented this feature. The problem is that if you talk to one JuliaConnectoR server with all your R processes, the performance can't be scaled. The number of concurrent jobs can only be increased to the number of threads on the computer where the server runs. If you want to design an application serving multiple clients at once, it is better to have one Julia process per client and to start this Julia process at a point that the additional loading time blocks the user in a minimal way. You can load the packages asynchronously in Julia, via a function that does the package loading @async. You only need to be sure later that the process has returned before you want to use the package, e.g. DataFrames. This way you can achieve what you want without modifying the JuliaConnectoR package.

stefan-m-lenz commented 3 years ago

I'll close this issue, as I have just released version 1.0.0, which allows that multiple R processes can connect to one Julia server. This makes it possible that a Julia function can be called in a future without having to start a new Julia process.

library(JuliaConnectoR)
library(future)
port <- startJuliaServer()
plan(multisession)
f1 <- future({
   juliaEval("begin sleep(2); 1; end") # uses the Julia server running at the port
})

f2 <- future({
   juliaEval("begin sleep(2); 2; end") # uses the Julia server running at the port
})

value(f1) # can be verified by the output here
value(f2)