Open vnijs opened 1 year ago
+1 for this one. Maybe adding an option()
argument to choose between panadas and polars.
Just to make sure I understand the request correctly.
We could implement the py_to_r
method for polars data frames. This means that whenever a python function called by reticulate returned a polars data frame, it would be converted into an R data frame. This is the same behavior as we have for pandas. Users can opt out by passing convert = FALSE
when importing the module.
For an example, if we implemented py_to_r
for polars data frames, calling something like the below would return an R data frame, while it currently returns a polars pointer to a Polars data frame.
polars <- reticulate::import("polars")
df <- polars$dataframe$DataFrame(data = list(
hello = 1:5
))
df
To be fair, you can get an R data.frame pretty easily by doing:
df$to_pandas()
which will trigger py_to_r
method for pandas data frames.
We could also add an option to the r_to_py
dataframe method, so R dataframes get converted into polars data frames when cast to Python objects.
Is that what you are suggesting? I don't have strong feelings about either option. However if we add py_to_r
for polars data frames it will be a potential breaking change as users might already be relying on the fact that polars data frames aren't automatically cast into R objects.
Yes I am for an automatic py_to_r(). And definitely a parameter should be available for users.
@OmarAshkar, do you have an example of some usage that automatic convertion is much nicer than calling .to_pandas()
.
I'm leaning towards not implementing this in reticulate as casting is simple one-liner and it's probably going to be a breaking change for some users.
@dfalbel Thanks for taking a look at this. What exactly would break? The fact that folks focusing on polars could remove steps in their work? If there are any breaks, I assume they would be quite happy about things being made simpler. It would definitely make writing tests for python/polars to be executed through reticulate much easier.
I think what @dfalbel is suggesting is that users likely have existing workflows where they are expecting polars dataframes to not eagerly convert to R dataframes, (similar to how TensorFlow tensors don't automatically converting to R arrays, even when convert = TRUE
).
The most minimal changes I can think of, that won't break existing workflows, would be to add an as.data.frame.<polars-df>
method, which could simply be as_r_value(x$to_pandas())
. This would make as_tibble()
work as well.
We can also add a [.<polars-df>
method, to make missing axes more ergonomic.
E.g., make py$df[2, ]
equivalent to df[2, :]
in python.
Today, if you want to pass a python :
to [
, that can be done (admittedly, not very ergonomically) like this:
bt <- import_builtins()
bt$slice(NULL)
for example
py$df[2, bt$slice(NULL)]
The current version of reticulate brings slice support to [
and [<-
. (Added in #1432).
This now works:
## slice a NumPy array
x <- np_array(array(1:64, c(4, 4, 4)))
# R expression | Python expression
# ------------ | -----------------
x[0] # x[0]
x[, 0] # x[:, 0]
x[, , 0] # x[:, :, 0]
x[NA:2] # x[:2]
x[`:2`] # x[:2]
x[2:NA] # x[2:]
x[`2:`] # x[2:]
x[NA:NA:2] # x[::2]
x[`::2`] # x[::2]
x[1:3:2] # x[1:3:2]
x[`1:3:2`] # x[1:3:2]
See ?py_get_item
for examples.
The same syntax should work for Polars DataFrames.
Would love to see this as well!
what's the status on this? currently, when using polars in quarto with revealjs, rendering is terrible. Is it possible to pre-process everything and use to_pandas without showing that?
CC @cderv, do you have any thoughts about ☝🏻 ?
I have been using Polars in Python and it is a wonderful, fast, DataFrame library for Python and Rust. There even seems to be work on creating R-bindings for polars as well (https://github.com/pola-rs/r-polars).
I use reticulate a lot in shiny apps and it would be great if reticulate could also support the Polars DataFrame format, at least in terms being able to convert a Polars DataFrame to an R data.frame. Since polars is based on Arrow, I hope this may be possible.
Below an example of what happens currently when using reticulate with a polars data.frame.