palday / JellyMe4.jl

RCall support for MixedModels.jl and lme4
MIT License
12 stars 2 forks source link

JellyMe4 breaks MixedModels.jl when used through JuliaCall #51

Closed john-b-edwards closed 3 years ago

john-b-edwards commented 3 years ago

I apologize for any ignorance on my part, as I'm far more well versed in R than I am in Julia. Been trying to come up with a way to run some large mixed models in Julia from R. I've noticed that when I load JellyMe4.jl, it breaks the fitting of mixed models using MixedModels.jl

dyn.load('C:\\Users\\edwar\\AppData\\Local\\Programs\\Julia-1.6.1\\bin\\libopenlibm.DLL')
julia.dir = JuliaCall::julia_setup(JULIA_HOME = 'C:\\Users\\edwar\\AppData\\Local\\Programs\\Julia-1.6.1\\bin')
#> Julia version 1.6.1 at location C:\Users\edwar\AppData\Local\Programs\JULIA-~1.1\bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
julia.dir$library("MixedModels")
julia.dir$library("RCall")
julia.dir$library("DataFrames")
julia.dir$library("StatsModels")
julia.dir$library("JellyMe4")
julia.dir$assign("dat",nlme::Machines)
julia.dir$eval("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat)")
#> Error: Error happens in Julia.
#> ArgumentError: You must use a tuple with the data -- (MixedModel, DataFrame) -- not just a model
#> Stacktrace:
#>  [1] sexp(#unused#::Type{RCall.RClass{:merMod}}, x::LinearMixedModel{Float64})
#>    @ JellyMe4 C:\Users\edwar\.julia\packages\JellyMe4\rvLx5\src\merMod.jl:17
#>  [2] sexp(s::LinearMixedModel{Float64})
#>    @ RCall C:\Users\edwar\.julia\packages\RCall\3mHXJ\src\convert\default.jl:214
#>  [3] docall(call1::Ptr{Nothing})
#>    @ Main.JuliaCall C:\Users\edwar\Documents\R\win-library\4.1\JuliaCall\julia\setup.jl:185

Created on 2021-06-18 by the reprex package (v2.0.0)

If I call julia.dir$library("JellyMe4") after fitting the model, then the code behaves correctly until I try to fit the mixed model again, when the same error pops up.

dyn.load('C:\\Users\\edwar\\AppData\\Local\\Programs\\Julia-1.6.1\\bin\\libopenlibm.DLL')
julia.dir = JuliaCall::julia_setup(JULIA_HOME = 'C:\\Users\\edwar\\AppData\\Local\\Programs\\Julia-1.6.1\\bin')
#> Julia version 1.6.1 at location C:\Users\edwar\AppData\Local\Programs\JULIA-~1.1\bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
julia.dir$library("MixedModels")
julia.dir$library("RCall")
julia.dir$library("DataFrames")
julia.dir$library("StatsModels")
julia.dir$assign("dat",nlme::Machines)
julia.dir$eval("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat)")
#> Julia Object of type LinearMixedModel{Float64}.
#> Linear mixed model fit by maximum likelihood
#>  score ~ 1 + Machine + (1 + Machine | Worker)
#>    logLik   -2 logLik     AIC       AICc        BIC    
#>   -108.2089   216.4178   236.4178   241.5341   256.3077
#> 
#> Variance components:
#>             Column    Variance Std.Dev.   Corr.
#> Worker   (Intercept)  13.817604 3.717204
#>          Machine: B   28.684965 5.355835 +0.49
#>          Machine: C   11.240978 3.352757 -0.36 +0.30
#> Residual               0.924634 0.961579
#>  Number of obs: 54; levels of grouping factors: 6
#> 
#>   Fixed-effects parameters:
#> --------------------------------------------------
#>                 Coef.  Std. Error      z  Pr(>|z|)
#> --------------------------------------------------
#> (Intercept)  52.3556      1.53437  34.12    <1e-99
#> Machine: B    7.96667     2.20988   3.61    0.0003
#> Machine: C   13.9167      1.40579   9.90    <1e-22
#> --------------------------------------------------
julia.dir$library("JellyMe4")
model = julia.dir$eval("robject(:lmerMod, Tuple([my_mod,dat]))", need_return="R")
print(model)
#> Linear mixed model fit by maximum likelihood  ['lmerMod']
#> Formula: score ~ 1 + Machine + (1 + Machine | Worker)
#>    Data: jellyme4_data
#>       AIC       BIC    logLik  deviance  df.resid 
#>  236.4178  256.3077 -108.2089  216.4178        44 
#> Random effects:
#>  Groups   Name        Std.Dev. Corr       
#>  Worker   (Intercept) 3.7172              
#>           MachineB    5.3558    0.49      
#>           MachineC    3.3528   -0.36  0.30
#>  Residual             0.9616              
#> Number of obs: 54, groups:  Worker, 6
#> Fixed Effects:
#> (Intercept)     MachineB     MachineC  
#>      52.356        7.967       13.917  
#> optimizer (LN_BOBYQA) convergence code: 5 (fit with MixedModels.jl) ; 0 optimizer warnings; 0 lme4 warnings
julia.dir$eval("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat)")
#> Error: Error happens in Julia.
#> ArgumentError: You must use a tuple with the data -- (MixedModel, DataFrame) -- not just a model
#> Stacktrace:
#>  [1] sexp(#unused#::Type{RCall.RClass{:merMod}}, x::LinearMixedModel{Float64})
#>    @ JellyMe4 C:\Users\edwar\.julia\packages\JellyMe4\rvLx5\src\merMod.jl:17
#>  [2] sexp(s::LinearMixedModel{Float64})
#>    @ RCall C:\Users\edwar\.julia\packages\RCall\3mHXJ\src\convert\default.jl:214
#>  [3] docall(call1::Ptr{Nothing})
#>    @ Main.JuliaCall C:\Users\edwar\Documents\R\win-library\4.1\JuliaCall\julia\setup.jl:185

Created on 2021-06-18 by the reprex package (v2.0.0)

What's unusual to me is that the Julia-equivalent code to me seems to work just fine, ex.

julia> using MixedModels, RCall, DataFrames, StatsModels

julia> machines = rcopy(R"nlme::Machines")
54×3 DataFrame
 Row │ Worker  Machine  score
     │ Cat…    Cat…     Float64
─────┼──────────────────────────
   1 │ 1       A           52.0
   2 │ 1       A           52.8
   3 │ 1       A           53.1
   4 │ 2       A           51.8
   5 │ 2       A           52.8
   6 │ 2       A           53.1
   7 │ 3       A           60.0
   8 │ 3       A           60.2
   9 │ 3       A           58.4
  10 │ 4       A           51.1
  11 │ 4       A           52.3
  ⋮  │   ⋮        ⋮        ⋮
  45 │ 3       C           71.0
  46 │ 4       C           64.1
  47 │ 4       C           66.2
  48 │ 4       C           64.0
  49 │ 5       C           72.1
  50 │ 5       C           72.0
  51 │ 5       C           71.1
  52 │ 6       C           62.0
  53 │ 6       C           61.4
  54 │ 6       C           60.5
                 33 rows omitted

julia> m = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)), machines)
Linear mixed model fit by maximum likelihood
 score ~ 1 + Machine + (1 + Machine | Worker)
   logLik   -2 logLik     AIC       AICc        BIC
  -108.2089   216.4178   236.4178   241.5341   256.3077

Variance components:
            Column    Variance Std.Dev.   Corr.
Worker   (Intercept)  13.817604 3.717204
         Machine: B   28.684965 5.355835 +0.49
         Machine: C   11.240978 3.352757 -0.36 +0.30
Residual               0.924634 0.961579
 Number of obs: 54; levels of grouping factors: 6

  Fixed-effects parameters:
──────────────────────────────────────────────────
                Coef.  Std. Error      z  Pr(>|z|)
──────────────────────────────────────────────────
(Intercept)  52.3556      1.53437  34.12    <1e-99
Machine: B    7.96667     2.20988   3.61    0.0003
Machine: C   13.9167      1.40579   9.90    <1e-22
──────────────────────────────────────────────────

This very well might be an RCall bug, but given that this specific interaction occurs between JellyMe4.jl and MixedModels.jl, I thought to post this here first and see if I could get any insight into why this is happening.

palday commented 3 years ago

Let's go through this step by step. :smile:

RCall works its magic by creating a mechanism for defining conversions between Julia types and R type and then using that mechanism for a few common R/Julia types (data frames, vectors, etc.). This is relatively well tested code and I doubt that we're hitting a bug there.

JuliaCall starts a Julia process in the background, then loads RCall within that Julia process and let's RCall do all the heavy lifting. This is a bit of a Rube Goldberg machine, but I think it's gotten less fragile in the last year or so.

JellyMe4 uses the RCall mechanism to define some useful conversions between MixedModels.jl and lme4. However, there are deep philosophical differences between languages that rear their ugly head and so the conversion isn't as automagic as a lot of things using RCall are. In particular, lme4 models keep a copy of the data frame used to generate the model matrices (well, technically a reference that becomes a copy in certain circumstances, but we'll keep it simple here), but MixedModels.jl does not keep a copy of the data frame, instead keeping only the model matrices. This means that you can convert an lme4 model to a MixedModels.jl model because you can just discard the extra copy of the data, but a MixedModels.jl model is not sufficient to immediately create an lme4 model: you still need a copy of the data frame in question. Currently, this is done by passing a Julia tuple of the model and the data frame.

This brings us to your line / error:

julia.dir$eval("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat)")
#> Error: Error happens in Julia.
#> ArgumentError: You must use a tuple with the data -- (MixedModel, DataFrame) -- not just a model

Notice the Julia error message: it's telling you what went wrong. To get a copy of that model back into R, you need to do something like:

julia.dir$eval("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat); (my_mod, dat)")

The bit before the semicolon fits and assigns the model to my_mod, the second part creates the required tuple. The last value (here: the tuple) is the return value and the one JuliaCall converts via RCall.

If you don't care about manipulating the model in Julia and thus don't need to assign it to a variable, you could fit the model and construct the tuple in one step:

 julia.dir$eval("(fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat), , dat)")

Now, there is one piece of the puzzle remaining: why does it seem to work before JellyMe4 is loaded? Well, it's only kinda working: there is no defined conversion, so JuliaCall just calls the show method in Julia to get the Julia output and shows that in R. After you've loaded JellyMe4, there is a hook defined that says, "hey, you want to convert this model to R, but there isn't enough info to do so properly, so let's error and tell you what you need to do!".

You got this right with this line:

julia.dir$eval("robject(:lmerMod, Tuple([my_mod,dat]))", need_return="R")

But when you tried fitting the model again, you only get the model and not the tuple back and so you get an error.

Hope that helps!

PS: You can simplify that line as well: julia.dir$eval("(my_mod,dat)").

john-b-edwards commented 3 years ago

I appreciate the step-by-step response! Unfortunately, even when adjusting my code with your suggested alterations, I get what appears to be the same error:

dyn.load('C:\\Users\\edwar\\AppData\\Local\\Programs\\Julia-1.6.1\\bin\\libopenlibm.DLL')
julia.dir = JuliaCall::julia_setup(JULIA_HOME = 'C:\\Users\\edwar\\AppData\\Local\\Programs\\Julia-1.6.1\\bin')
#> Julia version 1.6.1 at location C:\Users\edwar\AppData\Local\Programs\JULIA-~1.1\bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
julia.dir$library("MixedModels")
julia.dir$library("RCall")
julia.dir$library("DataFrames")
julia.dir$library("StatsModels")
julia.dir$library("JellyMe4")
julia.dir$assign("dat",nlme::Machines)
julia.dir$eval("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat); (my_mod, dat)")
#> Error: Error happens in Julia.
#> ArgumentError: You must use a tuple with the data -- (MixedModel, DataFrame) -- not just a model
#> Stacktrace:
#>  [1] sexp(#unused#::Type{RCall.RClass{:merMod}}, x::LinearMixedModel{Float64})
#>    @ JellyMe4 C:\Users\edwar\.julia\packages\JellyMe4\rvLx5\src\merMod.jl:17
#>  [2] sexp(s::LinearMixedModel{Float64})
#>    @ RCall C:\Users\edwar\.julia\packages\RCall\3mHXJ\src\convert\default.jl:214
#>  [3] setindex!(s::Ptr{VecSxp}, value::LinearMixedModel{Float64}, key::Int64)
#>    @ RCall C:\Users\edwar\.julia\packages\RCall\3mHXJ\src\methods.jl:188
#>  [4] sexp(#unused#::Type{RCall.RClass{:list}}, a::Vector{Any})
#>    @ RCall C:\Users\edwar\.julia\packages\RCall\3mHXJ\src\convert\base.jl:295
#>  [5] sexp
#>    @ C:\Users\edwar\.julia\packages\RCall\3mHXJ\src\convert\default.jl:214 [inlined]
#>  [6] sexp(x::Tuple{LinearMixedModel{Float64}, DataFrame})
#>    @ Main.JuliaCall C:\Users\edwar\Documents\R\win-library\4.1\JuliaCall\julia\convert.jl:8
#>  [7] docall(call1::Ptr{Nothing})
#>    @ Main.JuliaCall C:\Users\edwar\Documents\R\win-library\4.1\JuliaCall\julia\setup.jl:185

Calling this line:

 julia.dir$eval("(fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat), dat)")

Throws the same error as well (I believe there was a typo with an extra comma? I get a parse error with the original line).

I hope you can forgive my confusion with the discussion of the show method -- I don't quite understand why that would result in an error within R, using JuliaCall, versus native Julia. Both sessions have identical packages loaded, and I would assume that in theory, the same hook in JellyMe4 would load in both the R and Julia sessions -- but the R session throws an error whereas identical code in Julia does not. Is there something I am missing there?

I am also not entirely sure that the issue comes from the conversion of MixedModels.jl -> lme4 -- I have no issue with conversion when loading JellyMe4.jl after fitting the model. The issue appears to result from the fitting itself, which occurs before I would need to zip the model and the tuple together -- specifically, I believe the line where the error occurs is:

my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat)

Appreciate your assistance with this! Hopefully picking up some Julia knowledge as I go lol

palday commented 3 years ago

I hope you can forgive my confusion with the discussion of the show method -- I don't quite understand why that would result in an error within R, using JuliaCall, versus native Julia. Both sessions have identical packages loaded, and I would assume that in theory, the same hook in JellyMe4 would load in both the R and Julia sessions -- but the R session throws an error whereas identical code in Julia does not. Is there something I am missing there?

There's a bit of automagic going wrong. Julia always displays Julia objects based on show methods. (There are default methods, but package authors can define custom show methods for their types, which we have done for the types in MixedModels.jl.) So no problem there in native Julia. When using JuliaCall, JuliaCall first tries to use RCall for converting things. If no conversion for a particular type is defined, then JuliaCall just captures the show output from Julia and displays that in R, so no error. JellyMe4 is designed to provide an RCall conversion for MixedModels.jl, but for Julia -> R, the model alone doesn't suffice, so the conversion of the model alone is defined as an error. So JuliaCall checks to see if a conversion is defined, finds one, but then it "discovers" that the conversion is just an error. (The motivation for that design choice was that JellyMe4 has been loaded to do that conversion, so if a user tries to do it, it's better to give a helpful error than to just fail silently and capture the Julia show output.)

I am also not entirely sure that the issue comes from the conversion of MixedModels.jl -> lme4 -- I have no issue with conversion when loading JellyMe4.jl after fitting the model.

So this gives me an idea. The fitting isn't the problem, it's the conversion. When you fit first, you get the Julia output and so no error, and then you can convert freely afterwards. Maybe you can still split up the fitting and converting steps.

I haven't touched JuliaCall in a while ... but a quick look at the docs suggests a solution:

Fit first with:

julia.dir$eval("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat)", need_return="Julia") # prevent conversion via RCall

or

julia.dir$command("my_mod = fit(MixedModel, @formula(score ~ 1 + Machine + (1 + Machine|Worker)),dat)") # don't bother getting the return value

And then do the conversion:

julia.dir$eval("(my_mod,dat)")
john-b-edwards commented 3 years ago

That worked perfectly! Thank you so much.