joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

Differences in number of taxa after subset_samples and prune_taxa #1651

Open Alonso-Garcia opened 1 year ago

Alonso-Garcia commented 1 year ago

Hi all,

I have a phyloseq object (ps3) containing only bacteria found at least 3 times in at least 2 samples

> ps3
phyloseq-class object at experiment level
otu_table() OTU table: [ 6362 taxa and 216 samples ].
sample_data() Sample data: [ 216 samples by 16 sample variables ]
tax_table() Taxonomy table: [ 6362 taxa by 6 taxonomic ranks ].
refseq() DNAStringSet: [ 6362 reference sequences ]

Among my 216 samples, some are soil and some are lichen. I want to know which taxa of bacteria belong to each type of sample. For that, I have separated my phyloseq object :

ps3_soil <- subset_samples(ps3, Type_sample == "Soil")
ps3_soil <- prune_taxa(taxa_sums(ps3_soil) > 0, ps3_soil)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 5303 taxa and 102 samples ]
## sample_data() Sample Data:       [ 102 samples by 16 sample variables ]
## tax_table()   Taxonomy Table:    [ 5303 taxa by 6 taxonomic ranks ]
## refseq()      DNAStringSet:      [ 5303 reference sequences ]
ps3_lic <- subset_samples(ps3, Type_sample == "Lichen")
ps3_lic <- prune_taxa(taxa_sums(ps3_lic) > 0, ps3_lic)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1799 taxa and 114 samples ]
## sample_data() Sample Data:       [ 114 samples by 16 sample variables ]
## tax_table()   Taxonomy Table:    [ 1799 taxa by 6 taxonomic ranks ]
## refseq()      DNAStringSet:      [ 1799 reference sequences ]

The sum of the number of taxa in the soil (phyloseq object ps3_soil = 5303 taxa) and lichen (phyloseq object ps3_lic = 1799 taxa) does not match the number of taxa in the phyloseq object (phyloseq object ps3 = 6362 taxa).

Please, could you please tell me why ? Is this normal ? I thought that it might be expalined because of the shared taxa present in both soil and lichen.

Thank you for the software! any comment will be wellcome!