stefan-m-lenz / JuliaConnectoR

A functionally oriented interface for calling Julia from R
Other
100 stars 6 forks source link

dataframe support? #1

Closed KnutJaegersberg closed 4 years ago

KnutJaegersberg commented 4 years ago

do you plan support of transfer of dataframes from r to julia? its a natural data structure to use in r.

stefan-m-lenz commented 4 years ago

Thanks for the interest - you're the first one opening an issue!

I am not sure about this. For the next release, which will be very soon, hopefully, I do not plan to include data frames. What kept me from including them:

stefan-m-lenz commented 4 years ago

I thought about it again some more after writing my answer yesterday. I think it will not be much effort to implement it and have it as an optional dependency. I have an idea how to do it and I'm going to pursue it. And you are right, it is kind of natural to be able to translate data frames.

But I would still be interested in your examples for an application, if you would like to share it.

KnutJaegersberg commented 4 years ago

General data wrangeling and exploratory data analysis with julia packages are one use case, those are dataframe centric in julia (not meaning that I come here for those as an R user). For julias ML libraries, you mostly need to transform into arrays. Im primarily an R user, from my perspective, it would be great to leverage Julias ecosystems from within R with R typical syntax and concepts. I find R ux wise much more convenient, and am convinced that theoretically, as for reticulate and python, it is not mandatory to wrangle with all bits of julia syntax to leverage the ecosystem. Maybe not realistic, but it would be awesome to have a julia wrapper in R exposing julias rich and scalable ML ecosystem to R syntax in an R centric way, avoiding copying data whenever possible.
As a vision of an application I would like to run in the long term based on JuliaConnectoR is running JuliaDB and Onlinestats from within R, with syntax as close to typical R syntax as possible. Ideally in the very long term with a translation layer from JuliaDBMeta to dplyr. Same for using flux.jl and the rest of the offtaking ML ecosystem in Julia, with automatic translation from r dataframes to julia arrays as required by those libs. I like to use the brevity of R code on a Julia backend for ML projects of arbitrary size. I think you may have opened the door to a long term community development trajectory for enabeling this degree of integration.

stefan-m-lenz commented 4 years ago

Thank you for this explanation. The data frame support didn't make it into the latest release (0.3) because I have to think about it some more. The focus of this release was to make it ready to be usable with machine learning packages like Flux. For Flux, data frames are not needed. To call functions expecting matrices, you can transform your data with e.g. as.matrix. Also, copying data is less of a problem if the methods used in Julia do the heavy lifting and are not called too often.

But you make a very good point. I am also thinking about how it is possible to avoid copying data as much as possible, and also how to support data frame like structures. I think this is closely related because in a usage scenario like the advanced data wrangling one needs to operate on tables, which you don't want to copy most of the time. So my plan is to target this together in another release.

stefan-m-lenz commented 4 years ago

@KnutJaegersberg If you would like to try it out: Data frame support is now included via Tables.jl!

Here the example from the documentation of as.data.frame.JuliaProxy:

juliaEval('import Pkg; Pkg.add("JuliaDB")')
juliaImport("JuliaDB")

mydf <- data.frame(x = c(1, 2, 3),
                   y = c("a", "b", "c"),
                   z = c(TRUE, FALSE, NA),
                   stingsAsFactors = FALSE)

# create a table in Julia, e. g. via JuliaDB
mytbl <- JuliaDB.table(mydf)

# this table can, e g. be queried and
# the result can be translated to an R data frame
seltbl <- JuliaDB.select(mytbl, juliaExpr("(:x, :y)"))[1:2]

# translate selection of Julia table into R data frame
as.data.frame(seltbl)
KnutJaegersberg commented 4 years ago

very handy. Cool thanks a lot for implementing that!!!

arnold-c commented 2 years ago

I'm very new to Julia (and therefore JuliaConnectoR), so I'm not sure if this is a bug or expected behavior. It seems that as.data.frame() is very particular about the type of Table.jl it supports. For example, I've had to assign Julia DataFrame columns to Float64 and String types to avoid errors with being unable to convert Type ::Any. I suspect, similarly, it appears to matter how the Table.jl is created. See below for a reprex.

# This doesn't work
test_table1 <- juliaLet('
    table(x, z)',
    x = 1:10,
    z = 21:30
)
as.data.frame(test_table1)

# This works
test_table2 <- juliaEval('table(DataFrame(x = 1:10, y = 21:30))')
as.data.frame(test_table2)
r$> test_table1
<Julia object of type IndexedTable{StructArrays.StructVector{Tuple{Int64, Int64}, Tuple{Vector{Int64}, Vector{Int64}}, Int64}}>
Table with 10 rows, 2 columns:

r$> test_table2
<Julia object of type IndexedTable{StructArrays.StructVector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}, NamedTuple{(:x, :y), Tuple{Vector{Int64}, Vector{Int64}}}, Int64}}>
Table with 10 rows, 2 columns:

Apologies if I'm missing something due to my lack of experience with Julia, and thanks for your work with this package - it's great to be able to use R when it's necessary.

stefan-m-lenz commented 2 years ago

@arnold-c I created a new issue. It is better to separate issues because all people involved in an old issue will be notified when there is a new comment or any other changes such as a reopening of the issue. This issue here is already closed because it was about adding data frame support to the JuliaConnectoR in general. This is now a more specific topic.