voltrondata / substrait-r

An R Interface to the 'Substrait' Cross-Language Serialization for Relational Algebra
Other
27 stars 7 forks source link

Implement n() in Arrow bindings #227

Closed thisisnic closed 1 year ago

thisisnic commented 1 year ago

Fixes #219. However, this is not mergable yet due to what I think is a bug in the Arrow consumer. I have opened ARROW-18403 to report this.

library(arrow)
library(substrait)
library(dplyr)

data = tibble::tibble(
  x = 1:10,
  grp = c(rep(c("a", "b"), each = 4), NA, NA)
)

data
#> # A tibble: 10 × 2
#>        x grp  
#>    <int> <chr>
#>  1     1 a    
#>  2     2 a    
#>  3     3 a    
#>  4     4 a    
#>  5     5 b    
#>  6     6 b    
#>  7     7 b    
#>  8     8 b    
#>  9     9 <NA> 
#> 10    10 <NA>

data %>%
  arrow_substrait_compiler() %>%
  group_by(grp) %>%
  summarise(total_group_members = n(x)) %>%
  collect()
#> # A tibble: 3 × 2
#>     grp total_group_members
#>   <int> <chr>              
#> 1     4 a                  
#> 2     4 b                  
#> 3     2 <NA>

data %>%
  arrow_substrait_compiler() %>%
  group_by(grp) %>%
  summarise(total_group_members = n(grp)) %>%
  collect()
#> # A tibble: 3 × 2
#>     grp total_group_members
#>   <int> <chr>              
#> 1     4 a                  
#> 2     4 b                  
#> 3     0 <NA>

data %>%
  arrow_substrait_compiler() %>%
  summarise(total_group_members = n(x)) %>%
  collect()
#> # A tibble: 1 × 1
#>   total_group_members
#>                 <int>
#> 1                  10

data %>%
  arrow_substrait_compiler() %>%
  summarise(total_group_members = n(grp)) %>%
  collect()
#> # A tibble: 1 × 1
#>   total_group_members
#>                 <int>
#> 1                   8