sa-lee / plyexperiment

A fluent interface to SummarizedExperiment
7 stars 3 forks source link

Extension to MAE and eSet plus select_col #4

Closed jonocarroll closed 5 years ago

jonocarroll commented 5 years ago

closes #3

sa-lee commented 5 years ago

Looks great, thanks @jonocarroll !

I really need to get around to just implementing the dplyr verbs on DataFrame. That could really open up a lot of possibilities and simplify the code base for plyranges too. Would you be up for helping on that?

jonocarroll commented 5 years ago

Absolutely. Would it be as simple as

library(S4Vectors)
library(dplyr)

## extend mutate to S4 generic
setGeneric("mutate", function(.data, ...) dplyr::mutate(.data, ...))
#> [1] "mutate"

## create a DataFrame with list-column
mtcars_DF <- mtcars %>% 
  mutate(listcol = lapply(seq_len(nrow(.)), function(x) mtcars)) %>% 
  as("DataFrame")
mtcars_DF
#> DataFrame with 32 rows and 12 columns
#>           mpg       cyl      disp        hp      drat        wt      qsec
#>     <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 1        21.0         6       160       110      3.90     2.620     16.46
#> 2        21.0         6       160       110      3.90     2.875     17.02
#> 3        22.8         4       108        93      3.85     2.320     18.61
#> 4        21.4         6       258       110      3.08     3.215     19.44
#> 5        18.7         8       360       175      3.15     3.440     17.02
#> ...       ...       ...       ...       ...       ...       ...       ...
#> 28       30.4         4      95.1       113      3.77     1.513      16.9
#> 29       15.8         8     351.0       264      4.22     3.170      14.5
#> 30       19.7         6     145.0       175      3.62     2.770      15.5
#> 31       15.0         8     301.0       335      3.54     3.570      14.6
#> 32       21.4         4     121.0       109      4.11     2.780      18.6
#>            vs        am      gear      carb        listcol
#>     <numeric> <numeric> <numeric> <numeric>         <list>
#> 1           0         1         4         4 21,21,22.8,...
#> 2           0         1         4         4 21,21,22.8,...
#> 3           1         1         4         1 21,21,22.8,...
#> 4           1         0         3         1 21,21,22.8,...
#> 5           0         0         3         2 21,21,22.8,...
#> ...       ...       ...       ...       ...            ...
#> 28          1         1         5         2 21,21,22.8,...
#> 29          0         1         5         4 21,21,22.8,...
#> 30          0         1         5         6 21,21,22.8,...
#> 31          0         1         5         8 21,21,22.8,...
#> 32          1         1         4         2 21,21,22.8,...

## conversion of DataFrame to dplyr-compatible, mutate, and return
mutate_DF <- function(.data, ...) {
  suppressWarnings(dplyr::as_tibble(.data)) %>% 
    dplyr::mutate(...) %>% 
    as("DataFrame")
}
methods::setMethod("mutate", signature("DataFrame"), mutate_DF)
#> [1] "mutate"

## DataFrame input
mutate(mtcars_DF, hp_to_wt = hp/wt)
#> DataFrame with 32 rows and 13 columns
#>           mpg       cyl      disp        hp      drat        wt      qsec
#>     <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 1        21.0         6       160       110      3.90     2.620     16.46
#> 2        21.0         6       160       110      3.90     2.875     17.02
#> 3        22.8         4       108        93      3.85     2.320     18.61
#> 4        21.4         6       258       110      3.08     3.215     19.44
#> 5        18.7         8       360       175      3.15     3.440     17.02
#> ...       ...       ...       ...       ...       ...       ...       ...
#> 28       30.4         4      95.1       113      3.77     1.513      16.9
#> 29       15.8         8     351.0       264      4.22     3.170      14.5
#> 30       19.7         6     145.0       175      3.62     2.770      15.5
#> 31       15.0         8     301.0       335      3.54     3.570      14.6
#> 32       21.4         4     121.0       109      4.11     2.780      18.6
#>            vs        am      gear      carb        listcol  hp_to_wt
#>     <numeric> <numeric> <numeric> <numeric>         <list> <numeric>
#> 1           0         1         4         4 21,21,22.8,...  41.98473
#> 2           0         1         4         4 21,21,22.8,...  38.26087
#> 3           1         1         4         1 21,21,22.8,...  40.08621
#> 4           1         0         3         1 21,21,22.8,...  34.21462
#> 5           0         0         3         2 21,21,22.8,...  50.87209
#> ...       ...       ...       ...       ...            ...       ...
#> 28          1         1         5         2 21,21,22.8,...  74.68605
#> 29          0         1         5         4 21,21,22.8,...  83.28076
#> 30          0         1         5         6 21,21,22.8,...  63.17690
#> 31          0         1         5         8 21,21,22.8,...  93.83754
#> 32          1         1         4         2 21,21,22.8,...  39.20863

## data.frame input
mutate(mtcars, hp_to_wt = hp/wt) %>% head
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb hp_to_wt
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 41.98473
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 38.26087
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 40.08621
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 34.21462
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 50.87209
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 30.34682

?

If so, I'd be happy to generalise that to work with any of the dplyr verbs and submit a PR to plyranges. CC: @lawremi & @gmbecker who likely have significant insights into achieving this.