joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
571 stars 187 forks source link

merge_phyloseq on two phyloseq objects with different OTU table IDs don't add up #574

Closed kviljoen closed 8 years ago

kviljoen commented 8 years ago

I'm trying to merge two phyloseq objects (from 16S data produced using the same closed reference pipeline, just different runs) using merge_phyloseq. The problem is that the resulting merged object does not contain all unique OTU IDs from both tables, it seems to take all the IDs from the one table.

e.g. (J3 and Jv2 are objects to be merged; Jv3 = merged object): length(intersect(rownames(otu_table(Jv2)),rownames(otu_table(J3))))#4170 length(setdiff(rownames(otu_table(Jv2)),rownames(otu_table(J3))))#3362 length(setdiff(rownames(otu_table(J3)),rownames(otu_table(Jv2))))#794

4170+3362+794=8362 #so expecting 8362 OTUs in merged table

Jv3 <- merge_phyloseq(J3, Jv2) ntaxa(J3)#5669 ntaxa(Jv2)#7532 ntaxa(Jv3)#7532(why not 8362?)

Thanks for your help! (and for the great package!) Katie.

joey711 commented 8 years ago

When merge_phyloseq encounters two or more objects, it takes the intersection of the indices they have in common. In your case, the output should only retain taxa indices that are present in both tables (or fail with an informative error).

Your example looks suspicious because I would expect that ntaxa(Jv3) to be no larger than the smaller ntaxa of the two arguments.

If you want to create an object with the union of the two tables, you may want to consider making a change in your upstream OTU workflow (because what you're doing sounds suspicious), or to consider a table concatenating function like rbind. You should be careful.

If you can convince me that this is a needed feature ("OTU table union"), I might make this a feature request and add it as a new function. I can think of a couple was to do this that are pretty straightforward.

cheers

joey

kviljoen commented 8 years ago

Hi Joey,

Thanks for your response. So the two .biom files that were imported seperately are from two separate HiSeq runs. The upstream OTU picking was done by one of our collaborators for each run so I would have to ask them to put all the raw data from both runs through the pipeline, which is why I thought maybe I could just merge the two as phyloseq objects since the OTU picking is closed reference? Can you elaborate on why what I'm doing sounds suspicious? Is it because you'd rather send all the raw data through the pipeline together, or is it ok to do this as long as both runs went through exactly the same pipeline with closed reference picking?

Thanks! Katie.

joey711 commented 8 years ago

Okay, now that you've elaborated it is less suspicious :) Closed reference with the exact same ref-DB is a special case for which this would work. If you try to merge them in the usual way, it will take the intersection of the two OTU ID sets. So for this problem, you'll need first concatenate the tables. There are many ways to do this. Maybe some helpful user has already done this recently and will post some example code?

If you're including a tree, you'll want to import it only after the table is sorted out, as trees are easily "pruned" but not easily grafted together.

Cheers

joey

joey711 commented 8 years ago

Did you solve this problem? I will close unless or until a MRE is provided that demonstrates a bug. Best of luck!

kviljoen commented 8 years ago

Hi Paul,

I did thank you. I ended up manually merging the otu tables, tax tables to one of the existing phyloseq objects with bind_rows() and rbind() and then used merge_phyloseq() to merge the sample data. I don't really use the tree so didn't have to deal with that.

Regards,

Katie.

On Fri, Feb 19, 2016 at 7:24 PM, Paul J. McMurdie notifications@github.com wrote:

Closed #574 https://github.com/joey711/phyloseq/issues/574.

— Reply to this email directly or view it on GitHub https://github.com/joey711/phyloseq/issues/574#event-556845843.

Katie

Postdoctoral Research Fellow Computational Biology Institute of Infectious Diseases & Molecular Medicine University of Cape Town South Africa +27 21 406 6176 katieviljoen@gmail.com

fconstancias commented 6 years ago

Hi,

I am very interested in that possibility. I have been using phyloseq to analyse OTU table generated by metabarcoding and metaphlan2 and I am now using your awesome package to analyse results from kraken and centrifuge. I generated domain specific databases that I am planning to use in parallel depending on the community I would like to describe. e.g., one biom table of 10 samples for bacteria, one other for archaea (for the same 10 samples). Then I would like to concatenate these tables in phyloseq to be able to manipulate a single object. Thanks !

wasimbt commented 5 years ago

Hi ,

I just have opposite problem to Katie. I have four phyloseq class object crated by using 4 different primer pairs (p1, p2, p3, p4) from same samples and against the same reference. I was able to successfully merged them,

M1234 <- merge_phyloseq(p1, p2, p3 , p4)

And the combined object has total of all taxa from each phyloseq object

ntaxa(p1)#5437 ntaxa(p2)#7002 ntaxa(p3)#6889 ntaxa(p4)#8448

ntaxa(M1234)#27816

Why the merged object is not having all the OTU ids from maximum OTU object? It looks like that OTU ids are different in each class object for the same taxa (mostly)? Merging at OTU ID level is necessary to avoid problems in downstream analyses for example my merged object is now giving weird results for beta diversity analysis. How can i give a common OTU id in each object so that in the end i can have a common merged OTU for a taxa in the merged object instead of 4 OTU ids for the same taxa.

Thanks alot! Wasim

mikemc commented 5 years ago

Wasim, what method did you use to create your OTUs? "OTU ID" is an ill defined concept. OTUs defined from different primer pairs may not be compatible and it may not be able to combine them without making some extra assumptions than was was already made by whatever bioinformatics method you used to make your OTU tables.

wasimbt commented 5 years ago

Hi,

Thank you for your answer! I solved the problem, by first merging all the phyloseq objects and then collapsing them at species level by using tax_glom function. Now all the similar taxa with different OTU IDs are merged based on taxa identity and total sum of OTUs are not sum of all phyloseq object as i expected.

Best regards, Wasim

slq356 commented 4 years ago

@wasimbt, could you explain a bit more in detail how you merged the obejcts? Ive tried to merge with merge_phyloseq, followed by tax_glom with no difference in output.

wasimbt commented 4 years ago

Hi, In my case i have four objects created with different primer-pairs, thus different ASVs in each case. When I merged these using merge_phyloseq, they got merged but total ASVs got added up. I had to use tax_glom to collapsed all ASVs based on species level assignment in order to compare the objects.

How your datasets are created? and what do you mean by "no difference in output" could you clarify.

Best

slq356 commented 4 years ago

Hi So i have two or more datasets i want to merge. Ive used same labprotocol and sequence pipeline, so its the same primers and formats and all, but OTU1 in dataset 1 might be E coli, while OTU1 in dataset 2 is Listeria monocytogenes. So when i merge the two datasets, the tax-tables ends ud being all mixed up by merging OTUs together which are not the same. When i tax_glom the phyloseq obejct, the results keeps the same, with the wrong merging of the two datasets.

PotatoTongling commented 4 years ago

Hi slq356

Have you solved your problems? I have the exactly same problems as you, and still have no clue how to solve it.

mikemc commented 4 years ago

@PotatoTongling are you using OTUs or amplicon sequence variants? If you are using OTUs, then are you using closed reference or open reference OTUs? How you should merge your datasets depends on this information

matomoniwano commented 4 years ago

@slq356 I have the same problem and I am using ASVs for this. Is there any way to merge ASV counts from two tables by aligning taxonomy of each dataset?

mikemc commented 4 years ago

@matomoniwano If you are using ASVs, then you can make your taxa_names the ASV sequences, do a merge_phyloseq, and then use tax_glom() to e.g. genus level. Make sure that you assigned taxonomy to both sets of ASVs objects to the same taxonomy database though. Would that do what you want?

matomoniwano commented 4 years ago

@mikemc Thank you that is what I exactly want.