pola-rs / r-polars

Polars R binding
https://pola-rs.github.io/r-polars/
Other
483 stars 36 forks source link

Add `as_polars_expr` #835

Open eitsupi opened 9 months ago

eitsupi commented 9 months ago

Related to #599, #570, #715

etiennebacher commented 9 months ago

So basically you'd want to do the conversion to expressions in R rather than in Rust? Like the parse_* functions in Python? https://github.com/pola-rs/polars/blob/740e740d9ce3678ea061d5cb4c2bc94892838383/py-polars/polars/utils/_parse_expr_input.py

That would be a massive change internally. On the R side we also have wrap_e() that is used sometimes. Whether we choose to do the conversion to Expr on the R side or on the Rust side, I agree it would be good to harmonize this. This is also related to #422

eitsupi commented 9 months ago

For example, the process of casting something like the Then class to Expr is currently handled by the match arm on the Rust side and cannot be added outside of this Rust crate. https://github.com/pola-rs/r-polars/blob/a65ae54593ca495e568a48bcbc842d8b44ae0761/src/rust/src/lazy/dsl.rs#L164-L166

I think this should be defined in a generic function on the R side. However, this would be a rather extensive change, so we did not make that change in #836.

eitsupi commented 2 months ago

Seeing that scalar values were starting to get special treatment in upstream, I completely rewrote as_polars_expr() in the next branch (d83ceed22018cdd76fd8348cea23eb14674c5049).

Since there is no scalar class in R, it seems that <Expr>$first() must be called to convert the R object to a scalar value only if its length (strictly speaking, the length of the Series created from the R object) is 1. c.f. https://github.com/pola-rs/polars/issues/18686#issuecomment-2344246684

This change basically eliminates the need to describe the conversion from R to Polars in each of the two S3 methods, and if only as_polars_series() is defined, pl$lit() should also work properly.

library(neopolars)

lit0 <- pl$lit(hms::hms())
lit1 <- pl$lit(hms::as_hms("12:34:56"))
lit2 <- pl$lit(hms::as_hms(c("12:34:56", "23:45:56")))

lit0
#> Series[literal]

lit1
#> Series[literal].first()

lit2
#> Series[literal]

pl$select(lit0)
#> shape: (0, 1)
#> ┌─────────┐
#> │ literal │
#> │ ---     │
#> │ time    │
#> ╞═════════╡
#> └─────────┘

pl$select(lit1)
#> shape: (1, 1)
#> ┌──────────┐
#> │ literal  │
#> │ ---      │
#> │ time     │
#> ╞══════════╡
#> │ 12:34:56 │
#> └──────────┘

pl$select(lit2)
#> shape: (2, 1)
#> ┌──────────┐
#> │ literal  │
#> │ ---      │
#> │ time     │
#> ╞══════════╡
#> │ 12:34:56 │
#> │ 23:45:56 │
#> └──────────┘

Created on 2024-09-12 with reprex v2.1.1

eitsupi commented 2 months ago

Finally, it could be converted to Scalar as long as as_polars_series() is defined for the class. https://github.com/pola-rs/r-polars/commit/dd88f5467f6064da24c6f896bf8fa3457e19ef44

library(neopolars)

lit0 <- pl$lit(hms::hms())
lit1 <- pl$lit(hms::as_hms("12:34:56"))
lit2 <- pl$lit(hms::as_hms(c("12:34:56", "23:45:56")))

lit0
#> Series[literal]

lit1
#> 12:34:56

lit2
#> Series[literal]

pl$select(lit0)
#> shape: (0, 1)
#> ┌─────────┐
#> │ literal │
#> │ ---     │
#> │ time    │
#> ╞═════════╡
#> └─────────┘

pl$select(lit1)
#> shape: (1, 1)
#> ┌──────────┐
#> │ literal  │
#> │ ---      │
#> │ time     │
#> ╞══════════╡
#> │ 12:34:56 │
#> └──────────┘

pl$select(lit2)
#> shape: (2, 1)
#> ┌──────────┐
#> │ literal  │
#> │ ---      │
#> │ time     │
#> ╞══════════╡
#> │ 12:34:56 │
#> │ 23:45:56 │
#> └──────────┘

Created on 2024-09-22 with reprex v2.1.1