tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

dplyr 1.1.1 compatibility #146

Closed DavisVaughan closed 2 months ago

DavisVaughan commented 1 year ago

In dplyr 1.1.1, auto_copy() now throws a different and much improved error message. multidplyr was expecting the old error message in a test and wasn't using snapshot testing. https://github.com/tidyverse/dplyr/pull/6800

pf <- partition(data.frame(x = 1:6), default_cluster())
#> Initialising default cluster of size 2
df <- data.frame(x = 1:3, y = 3:1)

# Before
left_join(pf, df)
#> Error in `auto_copy()` at multidplyr/R/dplyr-dual.R:20:2:
#> ! `x` and `y` must share the same src.
#> ℹ set `copy` = TRUE (may be slow).

# After
left_join(pf, df)
#> Error in `auto_copy()` at multidplyr/R/dplyr-dual.R:20:2:
#> ! `x` and `y` must share the same source.
#> ℹ `x` is a <multidplyr_party_df> object.
#> ℹ `y` is a <data.frame> object.
#> ℹ Set `copy = TRUE` if `y` can be copied to the same source as `x` (may be slow).
DavisVaughan commented 1 year ago

I've added https://github.com/tidyverse/dplyr/commit/8ee46e6723b58ee8fb4538e2f6195bcd811f0fc1 to revert part of the auto_copy() error message changes so that we avoid a multidplyr failure.

Probably good to still merge this though for future robustness