Open vnijs opened 1 year ago
+1 for this one. Maybe adding an option()
argument to choose between panadas and polars.
Just to make sure I understand the request correctly.
We could implement the py_to_r
method for polars data frames. This means that whenever a python function called by reticulate returned a polars data frame, it would be converted into an R data frame. This is the same behavior as we have for pandas. Users can opt out by passing convert = FALSE
when importing the module.
For an example, if we implemented py_to_r
for polars data frames, calling something like the below would return an R data frame, while it currently returns a polars pointer to a Polars data frame.
polars <- reticulate::import("polars")
df <- polars$dataframe$DataFrame(data = list(
hello = 1:5
))
df
To be fair, you can get an R data.frame pretty easily by doing:
df$to_pandas()
which will trigger py_to_r
method for pandas data frames.
We could also add an option to the r_to_py
dataframe method, so R dataframes get converted into polars data frames when cast to Python objects.
Is that what you are suggesting? I don't have strong feelings about either option. However if we add py_to_r
for polars data frames it will be a potential breaking change as users might already be relying on the fact that polars data frames aren't automatically cast into R objects.
Yes I am for an automatic py_to_r(). And definitely a parameter should be available for users.
@OmarAshkar, do you have an example of some usage that automatic convertion is much nicer than calling .to_pandas()
.
I'm leaning towards not implementing this in reticulate as casting is simple one-liner and it's probably going to be a breaking change for some users.
@dfalbel Thanks for taking a look at this. What exactly would break? The fact that folks focusing on polars could remove steps in their work? If there are any breaks, I assume they would be quite happy about things being made simpler. It would definitely make writing tests for python/polars to be executed through reticulate much easier.
I think what @dfalbel is suggesting is that users likely have existing workflows where they are expecting polars dataframes to not eagerly convert to R dataframes, (similar to how TensorFlow tensors don't automatically converting to R arrays, even when convert = TRUE
).
The most minimal changes I can think of, that won't break existing workflows, would be to add an as.data.frame.<polars-df>
method, which could simply be as_r_value(x$to_pandas())
. This would make as_tibble()
work as well.
We can also add a [.<polars-df>
method, to make missing axes more ergonomic.
E.g., make py$df[2, ]
equivalent to df[2, :]
in python.
Today, if you want to pass a python :
to [
, that can be done (admittedly, not very ergonomically) like this:
bt <- import_builtins()
bt$slice(NULL)
for example
py$df[2, bt$slice(NULL)]
The current version of reticulate brings slice support to [
and [<-
. (Added in #1432).
This now works:
## slice a NumPy array
x <- np_array(array(1:64, c(4, 4, 4)))
# R expression | Python expression
# ------------ | -----------------
x[0] # x[0]
x[, 0] # x[:, 0]
x[, , 0] # x[:, :, 0]
x[NA:2] # x[:2]
x[`:2`] # x[:2]
x[2:NA] # x[2:]
x[`2:`] # x[2:]
x[NA:NA:2] # x[::2]
x[`::2`] # x[::2]
x[1:3:2] # x[1:3:2]
x[`1:3:2`] # x[1:3:2]
See ?py_get_item
for examples.
The same syntax should work for Polars DataFrames.
Would love to see this as well!
I have been using Polars in Python and it is a wonderful, fast, DataFrame library for Python and Rust. There even seems to be work on creating R-bindings for polars as well (https://github.com/pola-rs/r-polars).
I use reticulate a lot in shiny apps and it would be great if reticulate could also support the Polars DataFrame format, at least in terms being able to convert a Polars DataFrame to an R data.frame. Since polars is based on Arrow, I hope this may be possible.
Below an example of what happens currently when using reticulate with a polars data.frame.