Closed henganl2 closed 4 years ago
What is physeq15
and physeq75
? If these are two phyloseq objects with different sets of samples, then you should look into the merge_phyloseq()
function (without the _pair
) at the end.
Hi @mikemc,
Thanks! physeq15 and physeq17 are two phyloseq objects with two different sets of samples. The merge_phyloseq() works!
Thanks again!
Hi @mikemc,
I have a follow-up question about this issue. So I've been using merge_phyloseq to merge my phyloseq objects for a while, but I found a problem recently. When I merge multiple phyloseq objects with different tax table and frequency table, it seems like it merge the OTUs based on the OTU number, not the taxonomy information.
To make my question more clear, I have an example below
physeq01
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1212 taxa and 1 samples ]
tax_table() Taxonomy Table: [ 1212 taxa by 7 taxonomic ranks ]
refseq() DNAStringSet: [ 1212 reference sequences ]
physeq02
phyloseq-class experiment-level object
otu_table() OTU Table: [ 6030 taxa and 1 samples ]
tax_table() Taxonomy Table: [ 6030 taxa by 7 taxonomic ranks ]
refseq() DNAStringSet: [ 6030 reference sequences ]
test <- merge_phyloseq(physeq01, physeq02)
test
phyloseq-class experiment-level object
otu_table() OTU Table: [ 6498 taxa and 2 samples ]
tax_table() Taxonomy Table: [ 6498 taxa by 7 taxonomic ranks ]
refseq() DNAStringSet: [ 6498 reference sequences ]
So I check the otu_table and tax_table from those phyloseq objects by just pick out one OTU
In physeq 01
count taxonomy
OTU3 113 Alternaria
In physeq 02 count taxonomy OTU3 109 Fusarium
In test
count taxonomy
OTU3 113 109 Alternaria
So it actually merging two different OTUs into the same one I think the problem is similar to #574
Any suggestions for solving this problem?
Thanks!!
Phyloseq uses the otu/taxa names (as given by taxa_names(physeq)
as the fundamental identifier of an otu/taxon. If you want to merge phyloseq objects, then it is very important to either make the taxon names consistent, or make them completely distinct. That is, if you have taxa named "OTU3" in physeq01 and physeq02, these must mean the same OTU if you are going to merge them. If OTU3 means different things in each, then you should change the names before merging. If you plan to do taxonomy-based rather than OTU-based analysis, then you could just make the OTU names in each phyloseq object unique by, for example, adding physeq01_
to the beginning of the OTU names from physeq01,
taxa_names(physeq01) <- paste0("physeq01_", taxa_names(physeq01))
and similarly for physeq02, before merging. That way the OTUs from the two phyloseq objects will be kept separate, but you can still merge them by taxonomy using tax_glom()
, e.g. to the genus level.
If this still doesn't make sense, it might help to think back on how you created your OTUs and taxonomy assignment, and read up on the different types of OTUs (e.g. closed vs. open reference), and the challenges with using OTUs instead of ASVs when merging amplicon datasets (for the latter, see http://www.nature.com/doifinder/10.1038/ismej.2017.119)
@mikemc thanks, great answer.
@henganl2 if you actually want to compare, and use Mike's suggestion of agglomerating to the species or genus (or higher) level, you would use tax_glom()
first on each object, and then merge_phyloseq()
on the two tax_glommed objects. This approach requires that your taxonomy assignments were using the same method and reference database. And the caveats alluded by @mikemc are also important. This approach can work for some biological questions, though. But I would not recommend it as a general practice. The best approach is if your sequences come from the same target loci (e.g. V4), then you also have the option to set the "OTU ID" based on the ASVs themselves, or a short identifier that is consistent across ASV sequences in the two datasets. This requires that they were trimmed down to the same positions of the loci, and if this is not the case, you can just re-run denoising (e.g. dada2) after fixing the trimming to be consistent between the two datasets. It's a little extra work, but the advantage of being able to track the same biological sequence across all your data is a pretty large gain in interpretation, especially if the taxonomy database is not providing sufficient coverage or resolution for your research problem.
Hope that helps!
I will close for now, but feel free to re-open/comment as-needed.
Hi,
My data is ITS+LSU regions from oxford nanopore sequencer and contain 96 samples. I blast and generate the feature table (TAX) and frequency table (OTU) by each sample.
What should l do if I want to combine the 96 samples and generate only one phyloseq object and run the analysis?
I've tried to use two samples to run the following code:
mergephyseq <- merge_phyloseq_pair(physeq15, physeq75)
but this one will merge two samples into one.I've also checked
merge_sample
andmerge_tax
, both of them doesn't seem to be the way to go.It will be appreciated if you can give me some suggestion! Thanks in advance.