johnmchambers / XRJulia

XR-style Interface to Julia (from "Extending R")
33 stars 2 forks source link

Calling Julia functions in R very slow due to overhead of juliaGet() #12

Open woodwards opened 6 years ago

woodwards commented 6 years ago

I am testing calibration in R of a model in Julia. It works, but having to use juliaGet() for each model call makes it extremely slow. Is there a faster way to do it, where the Julia functions act like native R functions without the overhead?

My Julia module is

# linmod_module.jl
module linmod_module

export linmod, linmodvec, loglikelihood

function linmod(x, a, b)
  y = a + b * x
  return y
end

function linmodvec(datax, a, b)
  y = linmod(datax, a, b)
  return y
end

function loglikelihood(datay, datax, a, b, error)
  datan = size(datax, 1)
  y = linmod(datax, a, b)
  z = (datay - y) / error
  ll = - 0.5 * datan * log(2 * pi * error * error) - 0.5 * sum(z .* z);
  return ll
end

end

My R Script is

# try BayesianTools with Julia model

library(XRJulia)
library(BayesianTools)
# library(coda)

# make some data
x_data <- seq(0, 1, length.out=1000)
y_data <- 0.5+0.5*x_data+rnorm(length(x_data),0,0.1)
model_se <- 0.1

# find Julia functions
findJulia(test=TRUE)
ev <- RJulia() # create Julia evaluator
ev$AddToPath(getwd()) # find module linmod

# test Julia model
linmod <- JuliaFunction("linmod", module="linmod_module")
test <- linmod(0.5, 0.5, 0.5)

# test vectorised Julia model
linmodvec <- JuliaFunction("linmodvec", module="linmod_module")
test <- linmodvec(x_data, 0.5, 0.5)
juliaGet(test) # but this is very slow

# test Julia loglikelihood
loglikelihood <- JuliaFunction("loglikelihood", module="linmod_module")
test <- loglikelihood(y_data, x_data, 0.5, 0.5, model_se)

# BayesianTools setup
bt_names <- c("a", "b")
bt_prior <- createUniformPrior(c(0,0), c(1,1))
bt_likelihood <- function(params){
  loglikelihood(y_data, x_data, params[1], params[2], model_se)
}
bt_parallel <- FALSE
bt_setup <- createBayesianSetup(likelihood=bt_likelihood,
                                prior=bt_prior,
                                parallel=bt_parallel,
                                names=bt_names)
nBurnin <- 900
nSample <- 900
nTotal <-  nSample + nBurnin
nChains <- 3
nInternal <- 3
bt_settings <- list(startValue=nInternal, # DREAMzs internal chains
                    iterations=nTotal/nChains, # iterations per external chain
                    nrChains=nChains, 
                    burnin=nBurnin/nChains+1, # discard these ones
                    parallel=bt_parallel,
                    message=TRUE) 
bt_out <- runMCMC(bayesianSetup=bt_setup, 
                  sampler = "DREAMzs", 
                  settings=bt_settings)

suppressWarnings(summary(bt_out))
tracePlot(bt_out)
correlationPlot(bt_out)

bt_predict <- function(params){
  juliaGet(linmodvec(x_data, params[1], params[2]))
}

bt_error <- function(mean, params){
  return(rnorm(length(mean), mean=mean, sd=model_se))
}

bt_samples <- getSample(bt_out, start=2)
summary(bt_samples)

# this throws an error
suppressMessages({
  plotTimeSeriesResults(sampler=bt_samples,
                      model=bt_predict,
                      observed=y_data,
                      error=bt_error,
                      plotResiduals=TRUE,
                      main="Residual Analysis")
})
johnmchambers commented 6 years ago

If I understand your R script, it's not juliaGet() per se, but the fact that you are running MCMC (which is not fast in R anyway) with a 2-way interface with data conversion between R and Julia on each call for the likelihood.

Unfortunately, this is just the opposite of what I recommend (in Extending R) for good interface computation: give the other language a substantial computation to do.

vh-d commented 6 years ago

Also I doubt that computation of this particular likelihood function can be significantly improved by "outsourcing" it to a different language than R. All the code is vectorized and carried out by highly optimized fortran code called from R. So even using faster interface such as Rcpp would not help too much I am afraid.

woodwards commented 6 years ago

Thank you gentlemen.

The context is that I have a large biological model in ACSLX which we want to rewrite, since ACSLX is deprecated, and I am looking at options. We want to run this model both from R and also as a "DLL" within another, even larger model. I am looking at the readability, support, communication, speed, of a number of languages, including Fortran, C++, Python, Julia.

This is just a test script that I am learning with. It runs like lightning when I do exactly the same with Rcpp. Maybe this is because Rcpp does a more efficient job of passing the arrays? So are you saying that XRJulia is just relatively slow at passing data, maybe because some conversion is needed?

vh-d commented 6 years ago

The context is that I have a large biological model in ACSLX which we want to rewrite, since ACSLX is deprecated, and I am looking at options. We want to run this model both from R and also as a "DLL" within another, even larger model. I am looking at the readability, support, communication, speed, of a number of languages, including Fortran, C++, Python, Julia.

I don't know ACSLX but it looks a bit like what Modelica is used for these days. It is open source and has quite some momentum I think. There is also a Modia.jl (alternative to Modelica written by the creator of Modelica on top of Julia). In general, I think Julia is a great candidate but you have to write a lot of code yourself until the ecosystem of packages covers your field.

This is just a test script that I am learning with. It runs like lightning when I do exactly the same with Rcpp. Maybe this is because Rcpp does a more efficient job of passing the arrays? So are you saying that XRJulia is just relatively slow at passing data, maybe because some conversion is needed?

Yes, Rcpp uses R's C interface so it does not need to copy data very often. XRJulia starts Julia as a separate process and exchange data converted to/from JSON via sockets.

My point in the previous comment was that I expect Rcpp and native R solution to run at very similar speeds. It is probably the MCMC part that is the bottleneck.

woodwards commented 6 years ago

Yes ASCLX is a bit like Modelica. I had a brief look into Modelica, but it looked a bit niche, I couldn't find much in the way of forums etc,.

johnmchambers commented 6 years ago

Vaclav's summary is very likely the point. Rcpp has two speed advantages over XRJulia in terms of general interfaces (e.g section 12.3 of Extending R): it's at the subroutine level not the language level, so doesn't have to work with an expression; and it's embedded rather than working over connections so doesn't have to format and then parse data every time. In this case, it's pretty certain to be the second aspect that matters.

On top of that, the Rcpp implementers have done a lot of work on the C++ side to mesh data structures. It seems to work well for lots of applications, judging by the number of packages that use it.

woodwards commented 6 years ago

Thank you for your help!