mikemc / speedyseq

Speedy versions of phyloseq functions
https://mikemc.github.io/speedyseq/
Other
45 stars 6 forks source link

Create dplyr verb functions for sample data and tax tables #69

Closed mikemc closed 3 years ago

mikemc commented 3 years ago

Each can be a simple wrapper around the corresponding dplyr verb. E.g. the method for mutate_sample_data on phyloseq objects can be a function containing

sample_data(x) <- sample_data(x) %>%
  ps_tibble %>% 
  dplyr::mutate(...) %>%
  sample_data
x

For relevant verbs, can consider an optional .groups arg, in which case the operation will be wrapped in dplyr::with_groups().

To reduce code redundancy, might be useful to have a function that takes a dplyr verb and a phyloseq accessor/creator (sample_data or tax_table) and produces the relevant operation

mikemc commented 3 years ago

Given this implementation scheme, one can use .otu and .sample as column names in filter to select specific taxa and samples. Might be nice to support the use of .otu and .sample as column names in mutate, to set new sample and taxa names. For this to work, we need to add an extra step: e.g. create the new tax table, set taxa names in the phyloseq object to match the taxonomy table, and then replace the taxonomy table in the phyloseq object.

mikemc commented 3 years ago

I did not implement the .groups feature; may do so in the future once I run into a good use case.