pola-rs / r-polars

Bring polars to R
https://pola-rs.github.io/r-polars/
Other
415 stars 36 forks source link

Do we have a way to create `object` and `struct` with classic R functions? #1012

Closed etiennebacher closed 2 months ago

etiennebacher commented 2 months ago

I don't think we have a way to create object and struct from our standard c() and list() but maybe I'm missing something?

It would be good to have a small table in the docs to show the equivalent (if any) of those:

pl.Series(values=[1])
shape: (1,)
Series: '' [i64]
[
        1
]

> pl$Series(values = 1)
polars Series: shape: (1,)
Series: '' [f64]
[
    1.0
]
>>> pl.Series(values=[[1]])
shape: (1,)
Series: '' [list[i64]]
[
        [1]
]

> pl$Series(values = list(1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
    [1.0]
]
>>> pl.Series(values=[{1}])
shape: (1,)
Series: '' [o][object]
[
        {1}
]

???
>>> pl.Series(values=[{"a": 1}])
shape: (1,)
Series: '' [struct[1]]
[
        {1}
]

???
eitsupi commented 2 months ago

Are you looking for pl$Series(values = data.frame(a = 1))?

eitsupi commented 2 months ago

IIUC, the object type is Python-specific, not a real Apache Arrow type (so we don't support it).

etiennebacher commented 2 months ago

Are you looking for pl$Series(values = data.frame(a = 1))?

This is equivalent to calling a list:

> pl$Series(values = data.frame(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
    [1.0]
]
> pl$Series(values = list(a = 1))
polars Series: shape: (1,)
Series: '' [list[f64]]
[
    [1.0]
]
eitsupi commented 2 months ago

Oh, sorry. This is the one. https://github.com/pola-rs/r-polars/blob/3c0d0ec62a86da7fd66ef1afc53df590f384452f/R/as_polars.R#L367-L371

eitsupi commented 2 months ago

Can we close this now that #1015 has been merged? As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here. (Since R's list can contain a variety of things, we can always use the base R data.frame if we want to store something that is not supported by Apache Arrow)

etiennebacher commented 2 months ago

As I commented, the Object type is for storing Python objects, so I don't see the point in supporting it here.

That's something worth mentioning in the docs I think. I'll add that in #1014 and close this issue with this PR

etiennebacher commented 2 months ago

Actually it's hard to construct Struct for Series:

>>> pl.Series([{"a": 1, "b": ["x", "y"]}, {"a": 2, "b": ["z"]}])
shape: (2,)
Series: '' [struct[2]]
[
        {1,["x", "y"]}
        {2,["z"]}
]
as_polars_series(
  data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)

polars Series: shape: (2,)
Series: '' [struct[3]]
[
    {1,"x","z"}
    {2,"y","z"}
]

And it doesn't work for DataFrame:

pl$DataFrame(
  data.frame(a = 1)
)

shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ f64 │
╞═════╡
│ 1.0 │
└─────┘

Maybe we should say that we can't reliably create a Struct from scratch and point towards $to_struct() instead

eitsupi commented 2 months ago

Actually it's hard to construct Struct for Series:

We should use the I() function to create a list type column with data.frame(). Or, we can use tibble::tibble() or data.table::data.table() instead.

polars::as_polars_series(
  data.frame(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[3]]
#> [
#>  {1,"x","z"}
#>  {2,"y","z"}
#> ]

polars::as_polars_series(
  data.frame(a = 1:2, b = I(list(c("x", "y"), "z")))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

polars::as_polars_series(
  tibble::tibble(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

polars::as_polars_series(
  data.table::data.table(a = 1:2, b = list(c("x", "y"), "z"))
)
#> polars Series: shape: (2,)
#> Series: '' [struct[2]]
#> [
#>  {1,["x", "y"]}
#>  {2,["z"]}
#> ]

Created on 2024-04-10 with reprex v2.1.0

And it doesn't work for DataFrame:

pl$DataFrame() works like as_polars_df() when it receives a data.frame. (I think this behavior is worth removing because I find it confusing, but the point is that data.frame() works the same way, and in Python, polars.DataFrame.__init__() will convert a pandas.DataFrame to a polars.DataFame, so this is consistent behavior)

polars::pl$DataFrame(data.frame(a = 1))
#> shape: (1, 1)
#> ┌─────┐
#> │ a   │
#> │ --- │
#> │ f64 │
#> ╞═════╡
#> │ 1.0 │
#> └─────┘
polars::pl$DataFrame(a = data.frame(a = 1))
#> shape: (1, 1)
#> ┌───────────┐
#> │ a         │
#> │ ---       │
#> │ struct[1] │
#> ╞═══════════╡
#> │ {1.0}     │
#> └───────────┘

Created on 2024-04-10 with reprex v2.1.0