voltrondata / substrait-r

An R Interface to the 'Substrait' Cross-Language Serialization for Relational Algebra
Other
27 stars 7 forks source link

Add `substrait_join()` #225

Closed paleolimbot closed 1 year ago

paleolimbot commented 2 years ago

Ok, this mostly works, with some limitations:

library(substrait, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

cities <- tibble::tibble(
  city = c("Halifax", "Lancaster", "Chicago"),
  country = c("Canada", "United Kingdom", "United States")
)

countries <- tibble::tibble(
  country = c("United States", "Canada", "United Kingdom", "Morroco"),
  continent = c("North America", "North America", "Europe", "Africa")
)

cities |> 
  duckdb_substrait_compiler() |> 
  inner_join(countries) |> 
  arrange(city) |> 
  collect()
#> # A tibble: 3 × 3
#>   city      country        continent    
#>   <chr>     <chr>          <chr>        
#> 1 Chicago   United States  North America
#> 2 Halifax   Canada         North America
#> 3 Lancaster United Kingdom Europe

Created on 2023-03-06 with reprex v2.0.2

The limitations are:

I think this is ready for initial review, though (and I'll open tickets for the limitations when I've investigated a bit more).

paleolimbot commented 1 year ago

Ok, I got semi joins to work (because they are just inner join + emit) and opened up https://github.com/apache/arrow/issues/34484 to see if it's an Arrow problem or a me not understanding JoinRel problem (likely).