pola-rs / r-polars

Bring polars to R
https://pola-rs.github.io/r-polars/
Other
415 stars 36 forks source link

feat!: update built-in Rust Polars to 2023-04-20 #183

Closed eitsupi closed 1 year ago

eitsupi commented 1 year ago

I have not yet been able to correct the errors by breaking changes.

If anyone wants to work on this, you can push it to this branch.

The upstream includes various breaking changes. Two in particular that affect many areas are

TODO

vincentarelbundock commented 1 year ago

You almost certainly know this, so I'm leaving this here for reference: The ErrString::Owned errors are related to the upstream error refactor here: https://github.com/pola-rs/polars/pull/7362

eitsupi commented 1 year ago

You almost certainly know this, so I'm leaving this here for reference: The ErrString::Owned errors are related to the upstream error refactor here: pola-rs/polars#7362

Yes, I have updated the comment and added links to them.

sorhawell commented 1 year ago

Sometimes in the past when bumping rust-polars, it can become quite messy with 10-20 hotfixes here and there. The breaking changes are fortunately easy to catch when compiling. Non breaking changes can be hard to notices and update the code base for. Maybe polars supports some new join strategy, but r-polars will not allow before someone files an issue.

eitsupi commented 1 year ago

Updating the errors seems to be out of my hands 🤷

sorhawell commented 1 year ago

I fixed all compiler errors. Still missing: update tests + evaluate rust-polars changes and file some issues.

vincentarelbundock commented 1 year ago

I fixed a few errors but am unlikely to have more time in the near future. For convenience, here's a list of the errors I still see:

Error (test-expr_arr.R:4:3): arr$lengths
Error: in Series.new: SchemaMismatch(ErrString("invalid series dtype: expected List, got f64"))

Error (test-expr_arr.R:396:3): concat
Error: in Series.new: InvalidOperation(ErrString("new series from rtype ExternalPtr is not supported (yet)"))

Error (test-expr_arr.R:454:3): eval
Error in self$with_columns(expr): attempt to apply non-function

Error (test-expr_datetime.R:660:3): dt$replace_time_zone
Error in .pr$Expr$dt_replace_time_zone(self, valid_tz, use_earliest): expected a logical scalar

Error (test-expr.R:15:3): expression boolean operators
Error: in $collect(): when calling $collect() on LazyFrame: ComputeError(ErrString("The name: 'literal' passed to LazyFrame.with_columns is duplicate\n\nError originated just after this operation:\nDF []; PROJECT */0 COLUMNS; SELECTION: "None""))

Failure (test-expr.R:1512:3): hash + reinterpret
!all(hash_values1 == hash_values2) is not TRUE

Failure (test-expr.R:2176:3): extend_expr
pl$select(pl$lit(1)$extend_expr(5, -1)) did not throw the expected error.

Error (test-expr.R:2339:3): concat_list
Error in df$with_columns(lapply(0:2, function(i) pl$col("A")$shift(i)$alias(paste0("A_lag_", i))))$select(pl$concat_list(lapply(2:0, function(i) pl$col(paste0("A_lag_", i))))$alias("A_rolling")): attempt to apply non-function

Error (test-groupby.R:14:5): groupby print .name=POLARS_FMT_MAX_ROWS, .value=2
Error in trace_back(top = getOption("testthat_topenv"), bottom = trace_env): Can't find bottom on the call tree.
eitsupi commented 1 year ago

TODO

eitsupi commented 1 year ago

Remain test failures

sorhawell commented 1 year ago

I have been looking at Error (test-expr_arr.R:4:3): arr$lengths it fails because polars list-type have been reworked a lot. It is a bigger change, but finally R to series conversion could become more simple.

sorhawell commented 1 year ago

I got away with only a little change in the end. I was using a casting trick (empty float64 -> List) before to create an empty Series of inner datatype List or List. It seems that way of casting has been deprecated since. I finally found a ListBinaryChunkedBuilder to produce Series containing empty lists.

eitsupi commented 1 year ago

Ok, now all checks on the CI are green!

sorhawell commented 1 year ago

@eitsupi oh more edits, flag me when ready :)

eitsupi commented 1 year ago

I would like to eventually version up to the latest commit at this time. Thoughts?

sorhawell commented 1 year ago

I'm puzzled about how the "NA" / NA conversion error started. I fixed it in this PR 9d93a08-9e6137f , is not related to bumping rust-polars. It is now also an issue in join_asof PR. It must be some new crate or rextendr triggering it. The issue is extendr has an known tricky as_str() method which will convert R NA_character into &str "NA", which is not very desirable.

I will delay with updating other PRs until this fix has been merged.

eitsupi commented 1 year ago

I'm puzzled about how the "NA" / NA conversion error started. I fixed it in this PR 9e6137f , is not related to bumping rust-polars.

As I commented on https://github.com/pola-rs/r-polars/pull/190#issuecomment-1531183180, I think this is due to the new release of waldo.

sorhawell commented 1 year ago

As I commented on #190 (comment), I think this is due to the new release of waldo.

Nice then I have had a bug in these conversion forever, and now waldo sees it.

sorhawell commented 1 year ago

I would like to eventually version up to the latest commit at this time. Thoughts?

like bumping to polars 0.6.0? I aggree

oh you mean merge in main? I aggree also :)

maybe bumping polars can be delayed to after this merge

eitsupi commented 1 year ago

maybe bumping polars can be delayed to after this merge

Indeed, I was hoping to upgrade to today's version since rust-polars is currently 2 weeks old in this PR, but given that there are bugs that will be fixed by this PR, it seems better to merge now.

eitsupi commented 1 year ago

I think I have completed all the work and can merge if there are no problems with CI.

eitsupi commented 1 year ago

Can I merge this?

sorhawell commented 1 year ago

Can I merge this?

yeah

eitsupi commented 1 year ago

Thanks all!

vincentarelbundock commented 1 year ago

This is very cool! I think there are some new methods in the latest version. When I find time, I'll try to make a list and see which ones I'm able to implement.