Closed jonocarroll closed 5 years ago
Looks great, thanks @jonocarroll !
I really need to get around to just implementing the dplyr verbs on DataFrame
. That could really open up a lot of possibilities and simplify the code base for plyranges too. Would you be up for helping on that?
Absolutely. Would it be as simple as
library(S4Vectors)
library(dplyr)
## extend mutate to S4 generic
setGeneric("mutate", function(.data, ...) dplyr::mutate(.data, ...))
#> [1] "mutate"
## create a DataFrame with list-column
mtcars_DF <- mtcars %>%
mutate(listcol = lapply(seq_len(nrow(.)), function(x) mtcars)) %>%
as("DataFrame")
mtcars_DF
#> DataFrame with 32 rows and 12 columns
#> mpg cyl disp hp drat wt qsec
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 1 21.0 6 160 110 3.90 2.620 16.46
#> 2 21.0 6 160 110 3.90 2.875 17.02
#> 3 22.8 4 108 93 3.85 2.320 18.61
#> 4 21.4 6 258 110 3.08 3.215 19.44
#> 5 18.7 8 360 175 3.15 3.440 17.02
#> ... ... ... ... ... ... ... ...
#> 28 30.4 4 95.1 113 3.77 1.513 16.9
#> 29 15.8 8 351.0 264 4.22 3.170 14.5
#> 30 19.7 6 145.0 175 3.62 2.770 15.5
#> 31 15.0 8 301.0 335 3.54 3.570 14.6
#> 32 21.4 4 121.0 109 4.11 2.780 18.6
#> vs am gear carb listcol
#> <numeric> <numeric> <numeric> <numeric> <list>
#> 1 0 1 4 4 21,21,22.8,...
#> 2 0 1 4 4 21,21,22.8,...
#> 3 1 1 4 1 21,21,22.8,...
#> 4 1 0 3 1 21,21,22.8,...
#> 5 0 0 3 2 21,21,22.8,...
#> ... ... ... ... ... ...
#> 28 1 1 5 2 21,21,22.8,...
#> 29 0 1 5 4 21,21,22.8,...
#> 30 0 1 5 6 21,21,22.8,...
#> 31 0 1 5 8 21,21,22.8,...
#> 32 1 1 4 2 21,21,22.8,...
## conversion of DataFrame to dplyr-compatible, mutate, and return
mutate_DF <- function(.data, ...) {
suppressWarnings(dplyr::as_tibble(.data)) %>%
dplyr::mutate(...) %>%
as("DataFrame")
}
methods::setMethod("mutate", signature("DataFrame"), mutate_DF)
#> [1] "mutate"
## DataFrame input
mutate(mtcars_DF, hp_to_wt = hp/wt)
#> DataFrame with 32 rows and 13 columns
#> mpg cyl disp hp drat wt qsec
#> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 1 21.0 6 160 110 3.90 2.620 16.46
#> 2 21.0 6 160 110 3.90 2.875 17.02
#> 3 22.8 4 108 93 3.85 2.320 18.61
#> 4 21.4 6 258 110 3.08 3.215 19.44
#> 5 18.7 8 360 175 3.15 3.440 17.02
#> ... ... ... ... ... ... ... ...
#> 28 30.4 4 95.1 113 3.77 1.513 16.9
#> 29 15.8 8 351.0 264 4.22 3.170 14.5
#> 30 19.7 6 145.0 175 3.62 2.770 15.5
#> 31 15.0 8 301.0 335 3.54 3.570 14.6
#> 32 21.4 4 121.0 109 4.11 2.780 18.6
#> vs am gear carb listcol hp_to_wt
#> <numeric> <numeric> <numeric> <numeric> <list> <numeric>
#> 1 0 1 4 4 21,21,22.8,... 41.98473
#> 2 0 1 4 4 21,21,22.8,... 38.26087
#> 3 1 1 4 1 21,21,22.8,... 40.08621
#> 4 1 0 3 1 21,21,22.8,... 34.21462
#> 5 0 0 3 2 21,21,22.8,... 50.87209
#> ... ... ... ... ... ... ...
#> 28 1 1 5 2 21,21,22.8,... 74.68605
#> 29 0 1 5 4 21,21,22.8,... 83.28076
#> 30 0 1 5 6 21,21,22.8,... 63.17690
#> 31 0 1 5 8 21,21,22.8,... 93.83754
#> 32 1 1 4 2 21,21,22.8,... 39.20863
## data.frame input
mutate(mtcars, hp_to_wt = hp/wt) %>% head
#> mpg cyl disp hp drat wt qsec vs am gear carb hp_to_wt
#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 41.98473
#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 38.26087
#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 40.08621
#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 34.21462
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 50.87209
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 30.34682
?
If so, I'd be happy to generalise that to work with any of the dplyr
verbs and submit a PR to plyranges
. CC: @lawremi & @gmbecker who likely have significant insights into achieving this.
select_col
mutate_col
andrename_col
to work with MAEcolData
to eSet dispatchcloses #3