s3alfisc / wildboottestjlr

Fast Wild Cluster Bootstrap Inference for OLS/IV in R based on WildBootTests.jl & JuliaConnectoR
Other
0 stars 0 forks source link

JuliaConnectoR #15

Closed droodman closed 2 years ago

droodman commented 2 years ago

Possibly useful??? https://youtu.be/ObYDHi_jJXk https://arxiv.org/ftp/arxiv/papers/2005/2005.06334.pdf#page=4

s3alfisc commented 2 years ago

I'll take a look at it this weekend!

s3alfisc commented 2 years ago

First quick feedback: JuliaConnectoR looks very promising, in particular regarding setup time, which is < 10s vs 40-60s with JuliaCall.

How to connect R to Julia via JuliaConnectoR:

Run

library(usethis)
usethis::edit_r_environ()

This will open your .renviron file. In the file, type in the path to your Julia installation, and save

JULIA_BINDIR="C:/Users/alexa/AppData/Local/Programs/Julia-1.6.3/bin"

Now you should be able to do something along these lines:

library(JuliaConnectoR)

# load WildBootTests.jl
WildBootTests <- juliaImport("WildBootTests")

N <- 1000
k <- 3
predexog <- matrix(rnorm(N*k), N, k)
resp <- as.vector(predexog  %*% matrix(rnorm(k),k, 1) + rnorm(N))
clustid <- as.matrix(sample(1:10, N, TRUE))

# use WildBootTests directly from R, e.g. the wildboottest() function
WildBootTests$wildboottest(resp, 
                           predexog, 
                           clustid)

Here, I am not yet sure how to pass H_0.

But note that the 'connection' step to Julia via juliaImport() is much faster than withJuliaCall.

Working with special types in Julia while passing R objects to a Julia function - e.g. wildboottest(), could work via juliaLet():

juliaLet('WildBootTests.wildboottest(
              Float32, 
              ([0, 1, 0], [0]);
              resp, 
              predexog, 
              clustid)', 
         resp = resp, 
         predexog = predexog, 
         clustid = clustid)

This is much more elegant than using JuliaCall::julia_assign() and JuliaCall::julia_eval(), but unfortunately, this also does not work... I will investigate this further.

Right now, I am not sure how I would pass the constraints tuple H_0= [R, beta0] to R, as R does not know UTF-8 variable names. I could pass the tuple as a character (lots of pasting involved).

Another potential fix could be to rename H_{0} simply to H_0. Then I could pass it to wildboottest() as the other function arguments based on R objects.

Last, here's a working example:

juliaLet('println(x); print(y)', x = 1, y = 2) 
# 1.0
# 2.0
s3alfisc commented 2 years ago

Instead of renaming H₀, you could also allow the user of wildboottest() to either input a tuple via H₀, or to input R and r via separate function arguments.

droodman commented 2 years ago

Well, maybe I'm misunderstanding, but H_0 is a positional argument so it shouldn't/can't be passed by name....

But this issue could apply to the model constraints, R₁

But I just tried it in R and it seemed OK:

> R₁ <- 2
> R₁
[1] 2

Update: I see juliaLet has trouble with Unicode. So I have dropped the use of subscripts. See below.

Separately, I have dropped the use of tuples in wildboottest(). It seemed they were doing more harm than good. See the revised examples in the README.md. Now you can just do wildboottest(R, r; pred= ....)

droodman commented 2 years ago

Here is my best attempt so far. It is giving the same error I got through your JuliaCall interface, which I need to figure out. But that it got this far is promising:

library(JuliaConnectoR)
juliaEval(r'(push!(LOAD_PATH, raw"D:\OneDrive\Documents\Macros\WildBootTests.jl"))')
juliaEval('using WildBootTests')
juliaLet('wildboottest([0 1 0], [0]; resp, predexog, clustid)', resp = resp, predexog = predexog, clustid = clustid)
Error: Evaluation in Julia failed.
Original Julia error message:
MethodError: no method matching subsetview(::VectorizationBase.StridedPointer{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{1}, Static.StaticInt{1}}}, ::Static.StaticInt{2}, ::CartesianIndex{1})
Closest candidates are:
  subsetview(::VectorizationBase.StridedPointer{T, N, C, B, R, X, O}, ::Static.StaticInt{I}, !Matched::Integer) where {T, N, C, B, R, X, O, I} at C:\Users\drood\.julia\packages\LoopVectorization\hg50K\src\vectorizationbase_compat\subsetview.jl:2
Stacktrace:
  [1] panelsum!(dest::Matrix{Float32}, X::Matrix{Float32}, wt::Vector{Float32}, info::Vector{UnitRange{Int64}})
    @ WildBootTests D:\OneDrive\Documents\Macros\WildBootTest.jl\src\utilities.jl:239
  [2] panelsum
    @ D:\OneDrive\Documents\Macros\WildBootTest.jl\src\utilities.jl:288 [inlined]
  [3] panelsum2(X1::Matrix{Float32}, X2::Matrix{Float32}, wt::Vector{Float32}, info::Vector{UnitRange{Int64}})
    @ WildBoot```

Note that this uses the new syntax without tuples for H0 and H1.

droodman commented 2 years ago

Aha! The above example is now working for me. I switched from Julia 1.6.2 to 1.70-rc2 in JuliaConnectoR. I don't know if there's really a problem with 1.6.2 or if it's just a complication arising from having two versions of Julia installed at once. (I think the error message is coming from the LoopVectorization library, which is this amazing package written by someone who I think is outside the core Julia team that does a great job of optimizing loops for modern CPUs.)

droodman commented 2 years ago

And I have removed all subscripts from the wildboottest() parameter list. Model constraints are now specified with R1 and r1.

s3alfisc commented 2 years ago

This works for me under Julia 1.6.3:

library(JuliaConnectoR)

N <- 1000
k <- 3
predexog <- matrix(rnorm(N*k), N, k)
resp <- as.vector(predexog  %*% matrix(rnorm(k),k, 1) + rnorm(N))
clustid <- as.matrix(sample(1:10, N, TRUE))

juliaImport("WildBootTests")
juliaLet('WildBootTests.wildboottest([0 1 0], [0]; resp, predexog, clustid)', resp = resp, predexog = predexog, clustid = clustid)
s3alfisc commented 2 years ago

One drawback: I currently don't know how to get the estimation results from wildboottest() back to R after running juliaLet().

s3alfisc commented 2 years ago

Another drawback of JuliaConnectoR: compilation of Julia functions takes much more time. Running the wild bootstrap via juliaLet for the first time takes around 150s of compilation, vs around 60 with JuliaCall. I'm not sure what's going on under the hood of these functions - maybe a large share of the work that is happening in JuliaCall::julia_setup() does not happen in JuliaConnectoR::juliaImport() but actually takes place in JuliaConnectoR::juliaLet()?

s3alfisc commented 2 years ago

Here is a benchmark:

library(pracma) 

run_juliaconnector <- function(){
  library(JuliaConnectoR)
  juliaEval(r'(push!(LOAD_PATH, raw"C:\Users\alexa\.julia\packages\WildBootTests.jl"))')
  juliaEval('using WildBootTests')
  juliaLet('wildboottest([0 1 0], [0]; resp, predexog, clustid)', resp = resp, predexog = predexog, clustid = clustid)
}

run_juliacall <- function(){
  library(JuliaCall)
  julia_setup("C:/Users/alexa/AppData/Local/Programs/Julia-1.6.3/bin")
  JuliaCall::julia_library("WildBootTests")
  JuliaCall::julia_assign("R", c(0, 1, 0))
  JuliaCall::julia_assign("r", 0)
  JuliaCall::julia_assign("predexog", predexog)
  JuliaCall::julia_assign("resp", resp)
  JuliaCall::julia_assign("clustid", clustid)
  julia_eval("R")
  JuliaCall::julia_eval("boot_res = WildBootTests.wildboottest(
                                    [0 1 0], 
                                    [0];
                                    resp,
                                    predexog,
                                    clustid)")
}

N <- 1000
k <- 3
predexog <- matrix(rnorm(N*k), N, k)
resp <- as.vector(predexog  %*% matrix(rnorm(k),k, 1) + rnorm(N))
clustid <- as.matrix(sample(1:10, N, TRUE))

tic()
run_juliaconnector()
toc()
# elapsed time is 201.940000 seconds 

tic()
run_juliacall()
toc()
# elapsed time is 208.500000 seconds

So all in all, 200s vs 208 seconds and no speed advantage for JuliaConnectoR.

s3alfisc commented 2 years ago

How to return Julia function output via JuliaConnectoR:

foo <- juliaFun("function foo(a,b) a+b, a*b end")
res <- foo(2, 3)
class(res)
juliaGet(res)
droodman commented 2 years ago

So the speeds are about the same? That makes sense. I thought maybe JuliaConnectoR would be easier to work with, and maybe is more actively maintained.

I found a couple methods to programmatically extract the results through JuliaConnectoR:

Setup:

> juliaEval(r'(push!(LOAD_PATH, raw"D:\OneDrive\Documents\Macros\WildBootTests.jl"))')
[1] "@"                                                 "@v#.#"                                            
[3] "@stdlib"                                           "D:\\OneDrive\\Documents\\Macros\\WildBootTests.jl"
[5] "D:\\OneDrive\\Documents\\Macros\\WildBootTests.jl"
> juliaEval('using WildBootTests')
> 
> N <- 1000
> k <- 3
> predexog <- matrix(rnorm(N*k), N, k)
> resp <- as.vector(predexog  %*% matrix(rnorm(k),k, 1) + rnorm(N))
> clustid <- as.matrix(sample(1:10, N, TRUE))

The next line creates an R object called test that proxies for a Julia object, and apparently the object also exists in the Julia environment with the same name: > test <- juliaLet('wildboottest([0 1 0], [0]; resp, predexog, clustid)', resp = resp, predexog = predexog, clustid = clustid) Extract fields from the R object:

> test$stat
[1] -22.02282
attr(,"JLTYPE")
[1] "Float32"
> test$p
[1] 0
attr(,"JLTYPE")
[1] "Float32"
> test$CI
           [,1]       [,2]
[1,] -0.8966959 -0.7268232
attr(,"JLTYPE")
[1] "Float32"

You can see all the fields in the return object at the top of https://github.com/droodman/WildBootTests.jl/blob/master/src/interface.jl.

Or you can make the WildBootTests module look like an R object and then use the Julia interface functions I wrote. This is probably the better way since it uses the interface rather than directly accessing the results object, whose definition might change:

> WildBootTests <- juliaImport("WildBootTests")
Warning message:
Some names could not be expressed in the native encoding.
(Details see output of printing the returned object.) 
> WildBootTests$teststat(test)
[1] -22.02282
attr(,"JLTYPE")
[1] "Float32"
> WildBootTests$p(test)
[1] 0
attr(,"JLTYPE")
[1] "Float32"
> WildBootTests$CI(test)
           [,1]       [,2]
[1,] -0.8966959 -0.7268232
attr(,"JLTYPE")
[1] "Float32"
s3alfisc commented 2 years ago

I really wonder how you figured out the syntax of WildBootTests$teststat(test) - I spent quite some time investigating yesterday and did not figure it out... But really cool that it works!

I will migrate the workflow towards JuliaConnectoR then, as it is nicer to program with and seems to have some additional advantages. I agree that it makes sense that it's not faster - on the other hand, I have now so often made the experience that there are free lunches from using better software that I had somehow hoped that it would.

The code will look somehow along these lines:

library(JuliaConnectoR)

juliaEval(r'(push!(LOAD_PATH, raw"C:\Users\alexa\.julia\packages\WildBootTests.jl"))')
juliaEval('using WildBootTests')
juliaEval('using Random')

WildBootTests <- JuliaConnectoR::juliaImport("WildBootTests")

N <- 1000
k <- 3
predexog <- matrix(rnorm(N*k), N, k)
resp <- as.vector(predexog  %*% matrix(rnorm(k),k, 1) + rnorm(N))
clustid <- as.matrix(sample(1:10, N, TRUE))

# pass all arguments as function values, except for enumerated types and R & r
R <- c(0, 1, 0)
r <- 0
floattype <- "Float32"
type <- "rademacher"
ptype <- "equal_tailed"

paste_enumerated_args <- function(type = "rademacher", ptype = "two-tailed", rng = NULL){

  enumerated_type <- 
    switch(type,
         rademacher = "WildBootTests.rademacher",
         mammen = "WildBootTests.mammen",
         norm = "WildBootTests.normal",
         webb = "WildBootTests.webb",
         enumerated_type
         )

  enumerated_ptype <- 
  switch(ptype,
         "two-tailed" = "WildBootTests.symmetric",
         "equal_tailed" = "WildBootTests.equaltail",
         ">" = "WildBootTests.upper",
         "<" = "WildBootTests.lower",
         enumerated_ptype
  )

  if(is.null(rng)){
    rng <- "Random.MersenneTwister()"
  } else{ 
    rng <- paste0("Random.MersenneTwister(", rng, ")") 
  }

  paste0("auxwttype = ",enumerated_type, 
         ", ptype = ", enumerated_ptype, 
         ", rng = ", rng)

} 

positional_args <- paste0(floattype,",","[", paste0(R, collapse = " "),"],", "[", r, "]")
named_args <- 'resp, predexog, clustid'
enumerated_args <- paste_enumerated_args(type = "rademacher", ptype = "two-tailed", rng = 231235123)

juliaLet_char <- paste0('wildboottest(', positional_args,';', named_args ,',', enumerated_args, ")")

# can I pass positional arguments via juliaLet?
pracma::tic()
wildboottest_res <- juliaLet(juliaLet_char, 
                             resp = resp, predexog = predexog, clustid = clustid)
pracma::toc()

p_val <- WildBootTests$p(wildboottest_res)
conf_int <- WildBootTests$CI(wildboottest_res)
t_stat <- WildBootTests$teststat(wildboottest_res)
p_val; conf_int; t_stat

One thing I have yet to figure out is how to pass R when it is a matrix. The issue is that juliaLet() does not allow to pass positional arguments by name (which makes sense), so R and r need to be passed in a string / character. This is straightforward for vectors (they only need to be flattend) but not for matrices.

droodman commented 2 years ago

Oh, I don't think you should have to convert R and r to strings. That would be bad on principle since you can lose precision by converting numbers to strings. I think my example was just bad. Here are the new lines to make a better example:

r <- c(0)
R <- matrix(c(0,1,0), nrow=1)
juliaLet('wildboottest(R, r; resp, predexog, clustid)', R=R, r=r, resp=resp, predexog=predexog, clustid=clustid)

Note: JuliaConnectoR converts 1-element R vectors to Julia scalars. So to make this example work you either need to change "R, r" to "R, [r]" or install the very latest version of WildBootTests.jl, which accepts r as a scalar.

I am also wondering whether the juliaImport() approach could be used instead of juliaLet(), but I haven't figured it out yet. Here is a simple example of calling a Julia function with optional keyword arguments:

Base <- juliaImport('Base')
Base$sum(matrix(rnorm(4), 2, 2), dims=2L)

The optional dims argument tells sum() to sum across the second dimension. It is important that the value, 2, be stored as an integer rather than float, which is why it is written "2L".

Could wildboottest() be called in the same way? Is there a way in R to write a program to construct a list of optional arguments which is then passed to an R function? If the main obstacle is having to pass enumerated datatype values like WildBootTests.webb, then I can easily make wildboottest() accept strings.

By the way, I have submitted the Julia package for registration. This is my first time doing this, so I'm not sure if it will be accepted in hours, days, or weeks, or if I will run into a problem. Current status: https://github.com/JuliaRegistries/General/pull/49487

droodman commented 2 years ago

Oh this works fine!

WildBootTests <- juliaImport("WildBootTests")
> test <- WildBootTests$wildboottest(R, r, resp=resp, predexog=predexog, clustid=clustid)
> WildBootTests$CI(test)
           [,1]       [,2]
[1,] -0.6413519 -0.4712203
attr(,"JLTYPE")
[1] "Float32"

One subtlety to explain. In Julia, you can put a ; or , before the optional keyword arguments in a function call. So I can do sum(X, dims=1) or sum(X; dims=1). However, if I want to use the syntactic sugar for arguments and values having the same name, I need to use ;. So dims=1; sum(X, dims) will cause an error but dims=1; sum(X; dims) will work. But I assume there is nothing comparable in R, so I can't use the syntactic sugar when calling from R. That is why in the above example, the positional arguments R and r are included with R, r while the keyword arguments are included with resp=resp, predexog=predexog, clustid=clustid.

droodman commented 2 years ago

Even better! do.call(WildBootTests$wildboottest, list(R, r, resp=resp, predexog=predexog, clustid=clustid)) That list can be programmatically constructed.

s3alfisc commented 2 years ago

I checked out the latter approach - the do.call approach ultimately fails again because of the enumerate types and T = Float32. I have tried to pass T, ptype, auxwttype, rng as 'quotes' to wildboottest(), but this fails just as passing them as characters - JuliaConnectorR simply does not now how to translate these objects from R to Julia.

create_list <- function(floattype, ptype, auxwttype, rng){

  library(rlang)

  floattype <- enexpr(floattype)
  ptype <- enexpr(ptype)
  auxwttype <- enexpr(auxwttype)
  rng <- enexpr(rng)

  list(expr(!!floattype), 
       R, 
       r,
       resp = resp, 
       predexog = predexog, 
       clustid = clustid,
       ptype = expr(!!ptype), 
       auxwttype = expr(!!auxwttype), 
       rng = expr(!!rng))

}

float <- expr(Float32)
ptype <- expr(WildBootTests.symmetric)
auxwttype <- expr(WildBootTests.rademacher)
rng <- expr(Random.MersenneTwister())

eval_list <- create_list(!!float, !!ptype, !!auxwttype, !!rng)
eval_list

library(JuliaConnectoR)

juliaEval(r'(push!(LOAD_PATH, raw"C:\Users\alexa\.julia\packages\WildBootTests.jl"))')
juliaEval('using WildBootTests')
juliaEval('using Random')

WildBootTests <- JuliaConnectoR::juliaImport("WildBootTests")

N <- 1000
k <- 3
predexog <- matrix(rnorm(N*k), N, k)
resp <- as.vector(predexog  %*% matrix(rnorm(k),k, 1) + rnorm(N))
clustid <- as.matrix(sample(1:10, N, TRUE))

# pass all arguments as function values, except for enumerated types and R & r
R <- c(0, 1, 0)
r <- 0

eval_list
class(eval_list)
eval(eval_list)

do.call(WildBootTests$wildboottest, eval_list) 
# Error in juliaCall(name, ...) : object 'Float32' not found

If I run do.call, I get an error because R does not know Float32, and the same will happen with the enumerated types.

The same error should occur with WildBootTests$wildboottest(Float32, R, r, resp=resp, predexog=predexog, clustid=clustid).

Btw, with "input as character", I meant the following. I did not literally change the type of R from numeric to character, but instead fed

R <- c(0, 1, 0)
paste(R, collapse = " ")
# "0 1 0"

into juliaGet(), which is a character / string of length 1, which should be evaluated as numeric vector / array in Julia.

droodman commented 2 years ago

But on the Julia side, I should pretty easily be able to make it accept those strings.

s3alfisc commented 2 years ago

Yes, if you allow T & all enumerated types to be passed as characters, do.call will work.

droodman commented 2 years ago

Done:

> do.call(WildBootTests$wildboottest, list("Float32", R, r, resp=resp, predexog=predexog, clustid=clustid, auxwttype="webb"))

p  = 0.000
CI = Float32[-0.64126426 -0.47884554]

A remaining challenge is the "rng" argument... I wasn't able to get rng's to work through juliaImport(). But, rather remarkably, they work through juliaEval():

rng <- juliaEval("using StableRNGs; StableRNG(1321)")
do.call(WildBootTests$wildboottest, list("Float32", R, r, resp=resp, predexog=predexog, clustid=clustid, rng=rng))

rng <- juliaEval("Random.MersenneTwister(224)")
do.call(WildBootTests$wildboottest, list("Float32", R, r, resp=resp, predexog=predexog, clustid=clustid, rng=rng))