joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

Merge and split 2 phyloseq objects #825

Closed NinaKons closed 7 years ago

NinaKons commented 7 years ago

Hi Joey,

I have just started looking at the phyloseq code and was wandering how can I merge and split 2 phyloseq objects based on the sampel_IDs in the metadata table?

Merge

Split or subset this common phyloseq object

Merge

Load 3 input files generated via DADA2 pipeline

seqtab.nochim <- readRDS("/home/ubuntu/work/cDNA/trimmed/seqtab.rds") # Chimer-removed RSV table tax <- readRDS("/home/ubuntu/work/cDNA/trimmed/tax.rds") # Taxonomy table mt <- read.csv(file="metadata.csv", header=TRUE) # Metadata

Phyloseq object

ps_DNA <- phyloseq(otu_table(seqtab.nochim_sp, taxa_are_rows = FALSE), tax_table(tax), sample_data(mt))

Remove unclassified taxa and reduce sparsity

physeq_DNA <- subset_taxa(ps_DNA, Kingdom != "k__unclassified") physeq_DNA <- prune_taxa(taxa_sums(ps_DNA) > 0 , ps_DNA)

Merge

Can apply the same code to cDNA dataset and try to merge the 2 phyloseq objects:

ps_merge <- merge_phyloseq(physeq_DNA, physeq_cDNA)

But, have not tested whether merge_phyloseq can merge 2 phyloseq objects or found this case in the tutorial (https://joey711.github.io/phyloseq/merge.html#merge_phyloseq).

Subset

Subset DNA and cDNA with the same samples

sub_ps_same_samples = subset_samples(ps_merged, mt_DNA$Short_name = mt_cDNA$Short_name)

Split the subsetted phyloseq object

sub_ps_DNA = subset_samples(sub_ps_same_samples, Short_name = mt_DNA$Short_name) sub_ps_cDNA = subset_samples(sub_ps_same_samples, Short_name = mt_cDNA$Short_name)

Many thanks in advance!

joey711 commented 7 years ago

@benjjneb do you have a suggestion here?

joey711 commented 7 years ago

Yes, this is the intended use of merge_phyloseq. The sample names from the two phyloseq objects should not overlap. The taxa_names for the same RSV must be the same. Your sample_data should have a variable that indicates which assay type generated each sample (for your sanity).

The fact of not including a tree (yet) is fine, as there is no good way to merge two trees that I'm aware. If someone shows me one I'd love to support it, though. Once you have the final set of RSV sequences, you can then compute a tree and add to the combined phyloseq object.