joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
586 stars 186 forks source link

Scoping issues in subset_taxa (and probably other similar functions) #1695

Open mhoban opened 1 year ago

mhoban commented 1 year ago

The function subset_taxa (and probably other similar functions that use ellipses to pass subset expressions) can only handle expressions passed where all referenced objects are in the global scope, so (for example) trying to subset a phyloseq object inside of a function where you've defined the criteria won't work. See this simple example:

library(phyloseq)
data(GlobalPatterns)

# doesn't work, because `phyla` isn't in global scope
do_subset <- function(ps) {
  phyla <- c('Crenarchaeota','Euryarchaeota','Planctomycetes')
  subset_taxa(ps,Phylum %in% phyla)
}
do_subset(GlobalPatterns)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'table' in selecting a method for function '%in%': object 'phyla' not found

# works because `phyla` is now in global scope, but the subset_ call
# uses the global version rather than the version in the function scope
phyla <- c('Nitrospirae','Gemmatimonadetes','Fusobacteria')
thing <- do_subset(GlobalPatterns)

f <- as.data.frame(tax_table(thing))

# this contains the phyla from the global `phyla` object
unique(f$Phylum)
#> [1] "Fusobacteria"     "Gemmatimonadetes" "Nitrospirae"

I encountered this because I was trying to use the purrr map functions to take differing subsets of a ps object and do stuff with that, like this:

phyla_subsets <- list(a=c('Crenarchaeota','Euryarchaeota','Planctomycetes'),b=c('Nitrospirae','Gemmatimonadetes','Fusobacteria'))
phyla_subsets %>%
  map(~{
    ps <- GlobalPatterns %>%
      subset_taxa(Phylum %in% .x)
    # ... now do something with ps ...
  })
#> Error in phyla_subsets %>% map(~{: could not find function "%>%"

I can set .x to something global using the <<- operator, but that feels very kludgy and may affect other things down the line

roey-angel commented 5 months ago

It's a persistent issue that is unfortunately not resolved. Both subset_samples() and subset_taxa() (and maybe other functions as well) have a scoping issue and do not recognise function variables. The workaround is to either use the <<- operator or assign the variables before calling the function. e.g.:

data(GlobalPatterns)
sample_type = "Ocean"
subset_function <- function(physeq_obj = GlobalPatterns, sample_type ){
  subset_samples(GlobalPatterns,  SampleType==sample_type)
}
subset_function(GlobalPatterns, sample_type = "Ocean")