joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
569 stars 187 forks source link

Preprocessing Phyloseq object #1093

Open bioinfonext opened 5 years ago

bioinfonext commented 5 years ago

How to do Preprocessing and filtering of the phyloseq object all these following three criteria

1) taxa with zero counts; 2) taxa with ambiguous taxonomic assignment at the Kingdom or Phylum level; 3) taxa that were seen less than 2 times in 2 or more samples

Thanks

mikemc commented 5 years ago

Take a look at the help for the functions prune_taxa, filter_taxa, and subset_taxa. You'll also need to understand some basic R mechanics and built-in functions to use these functions successfully. I can suggest some code if you are more specific about 1, 2, and 3.

1- Do you mean that the total count summed across all samples is 0? (ie., taxa_sums(ps) would be zero for that taxon). If this is what you want, you can do either

ps1 <- filter_taxa(ps, function(x) sum(x) > 0, prune = TRUE)

or

ps1 <- prune_taxa(taxa_sums(ps) > 0, ps)

2- How these are marked in your tax_table. Are they marked as missing data (i.e., NA), or marked as ambiguous in some other way? If they are NA, you can do

ps2 <- subset_taxa(ps1, !is.na(Kingdom) & !is.na(Phylum))

3- Not sure exactly what you are getting at here, can you clarify with an example? Which are the taxa you want to keep vs. throw away in this case?