jonocarroll / DFplyr

A `DataFrame` (`S4Vectors`) backend for `dplyr`
GNU General Public License v3.0
21 stars 0 forks source link

Test bplyr integration #9

Open jonocarroll opened 4 years ago

jonocarroll commented 4 years ago

https://github.com/yonicd/bplyr

Appears to work for mutate and filter, even processing an S4 column

library(S4Vectors)
m <- mtcars[, c("cyl", "hp", "am", "gear", "disp")]
d <- as(m, "DataFrame")
d$gr <- GenomicRanges::GRanges("chrY", IRanges::IRanges(1:32, width=10))
d$gr2 <- GenomicRanges::GRanges("chrX", IRanges::IRanges(1:32, width = 10))
d$nl <- IRanges::NumericList(lapply(d$gear, function(n) round(rnorm(n), 2)))
d
#> DataFrame with 32 rows and 8 columns
#>                         cyl        hp        am      gear      disp
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>
#> Mazda RX4                 6       110         1         4       160
#> Mazda RX4 Wag             6       110         1         4       160
#> Datsun 710                4        93         1         4       108
#> Hornet 4 Drive            6       110         0         3       258
#> Hornet Sportabout         8       175         0         3       360
#> ...                     ...       ...       ...       ...       ...
#> Lotus Europa              4       113         1         5      95.1
#> Ford Pantera L            8       264         1         5       351
#> Ferrari Dino              6       175         1         5       145
#> Maserati Bora             8       335         1         5       301
#> Volvo 142E                4       109         1         4       121
#>                           gr        gr2                   nl
#>                    <GRanges>  <GRanges>        <NumericList>
#> Mazda RX4          chrY:1-10  chrX:1-10 -0.26,0.22,-1.33,...
#> Mazda RX4 Wag      chrY:2-11  chrX:2-11    0.35,0.67,2.5,...
#> Datsun 710         chrY:3-12  chrX:3-12 0.47,-0.76,-1.91,...
#> Hornet 4 Drive     chrY:4-13  chrX:4-13     -2.78,-1.82,0.81
#> Hornet Sportabout  chrY:5-14  chrX:5-14      0.03,-1.51,1.01
#> ...                      ...        ...                  ...
#> Lotus Europa      chrY:28-37 chrX:28-37  0.29,1.11,-0.13,...
#> Ford Pantera L    chrY:29-38 chrX:29-38   1.9,-1.43,-0.6,...
#> Ferrari Dino      chrY:30-39 chrX:30-39  0.76,0.28,-0.16,...
#> Maserati Bora     chrY:31-40 chrX:31-40  -0.14,0.96,1.52,...
#> Volvo 142E        chrY:32-41 chrX:32-41 -0.49,0.54,-1.55,...

mutateDF <- function(.data,...){

  FNS <- lapply(rlang::quos(...),rlang::quo_expr)

  EXPRS <- lapply(names(FNS),function(x){
    sprintf('%s <- %s',x,deparse(FNS[[x]]))
  })

  within(.data,eval(parse(text = paste0(unlist(EXPRS),collapse = '\n'))))

}
mutateDF(d, nl2 = 2 * nl)
#> Warning: `quo_expr()` is deprecated as of rlang 0.2.0.
#> Please use `quo_squash()` instead.
#> This warning is displayed once per session.
#> DataFrame with 32 rows and 9 columns
#>                         cyl        hp        am      gear      disp
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>
#> Mazda RX4                 6       110         1         4       160
#> Mazda RX4 Wag             6       110         1         4       160
#> Datsun 710                4        93         1         4       108
#> Hornet 4 Drive            6       110         0         3       258
#> Hornet Sportabout         8       175         0         3       360
#> ...                     ...       ...       ...       ...       ...
#> Lotus Europa              4       113         1         5      95.1
#> Ford Pantera L            8       264         1         5       351
#> Ferrari Dino              6       175         1         5       145
#> Maserati Bora             8       335         1         5       301
#> Volvo 142E                4       109         1         4       121
#>                           gr        gr2                   nl
#>                    <GRanges>  <GRanges>        <NumericList>
#> Mazda RX4          chrY:1-10  chrX:1-10 -0.26,0.22,-1.33,...
#> Mazda RX4 Wag      chrY:2-11  chrX:2-11    0.35,0.67,2.5,...
#> Datsun 710         chrY:3-12  chrX:3-12 0.47,-0.76,-1.91,...
#> Hornet 4 Drive     chrY:4-13  chrX:4-13     -2.78,-1.82,0.81
#> Hornet Sportabout  chrY:5-14  chrX:5-14      0.03,-1.51,1.01
#> ...                      ...        ...                  ...
#> Lotus Europa      chrY:28-37 chrX:28-37  0.29,1.11,-0.13,...
#> Ford Pantera L    chrY:29-38 chrX:29-38   1.9,-1.43,-0.6,...
#> Ferrari Dino      chrY:30-39 chrX:30-39  0.76,0.28,-0.16,...
#> Maserati Bora     chrY:31-40 chrX:31-40  -0.14,0.96,1.52,...
#> Volvo 142E        chrY:32-41 chrX:32-41 -0.49,0.54,-1.55,...
#>                                    nl2
#>                          <NumericList>
#> Mazda RX4         -0.52,0.44,-2.66,...
#> Mazda RX4 Wag           0.7,1.34,5,...
#> Datsun 710        0.94,-1.52,-3.82,...
#> Hornet 4 Drive        -5.56,-3.64,1.62
#> Hornet Sportabout      0.06,-3.02,2.02
#> ...                                ...
#> Lotus Europa       0.58,2.22,-0.26,...
#> Ford Pantera L      3.8,-2.86,-1.2,...
#> Ferrari Dino       1.52,0.56,-0.32,...
#> Maserati Bora      -0.28,1.92,3.04,...
#> Volvo 142E         -0.98,1.08,-3.1,...

filterDF <- function(.data,...){
  subset(.data,{
    eval(rlang::quo_expr(rlang::quo(...)))
  })
}
filterDF(d, lengths(nl) == 5)
#> DataFrame with 5 rows and 8 columns
#>                      cyl        hp        am      gear      disp
#>                <numeric> <numeric> <numeric> <numeric> <numeric>
#> Porsche 914-2          4        91         1         5     120.3
#> Lotus Europa           4       113         1         5      95.1
#> Ford Pantera L         8       264         1         5       351
#> Ferrari Dino           6       175         1         5       145
#> Maserati Bora          8       335         1         5       301
#>                        gr        gr2                  nl
#>                 <GRanges>  <GRanges>       <NumericList>
#> Porsche 914-2  chrY:27-36 chrX:27-36  0.27,0.77,0.38,...
#> Lotus Europa   chrY:28-37 chrX:28-37 0.29,1.11,-0.13,...
#> Ford Pantera L chrY:29-38 chrX:29-38  1.9,-1.43,-0.6,...
#> Ferrari Dino   chrY:30-39 chrX:30-39 0.76,0.28,-0.16,...
#> Maserati Bora  chrY:31-40 chrX:31-40 -0.14,0.96,1.52,...

Created on 2020-01-29 by the reprex package (v0.3.0)

(with dispatch, of course).

It doesn't seem to work to call the b_mutate methods internally, but maybe I'm doing something wrong. Collaboration, @yonicd?

yonicd commented 4 years ago

I’ll take a look on my end

jonocarroll commented 4 years ago

Progress... https://github.com/jonocarroll/DFplyr/tree/bplyr_integration

The README renders in the current form (including S4 columns). I haven't finished, but I found a lot of edge cases and have dealt with them.

yonicd commented 4 years ago

Looks better! A few q’s (probably me not grokking)

You are importing dplyr?

Aren’t the Fn names causing ns conflicts?

If you are using base underneath why would the user want to install dplyr?

jonocarroll commented 4 years ago

I only import the generics - without those there's no dispatch. You reclassed everything and wrote new generics but this is 'supposed' to be the way to extend a generic - write the method for a new class. Plus this way mutate works whether you pass it a data.frame or a DataFrame. My original idea was to use the tbl methods under the hood but there are glaring issues with that.

I could write new generics but that breaks dplyr if it's also attached.

yonicd commented 4 years ago

Ok. The original noplyr was like that but still caused tons of ns problems. I’ll look more closely at how you did it to figure out what i did wrong there. Cheers ;)