Open werdnus opened 6 years ago
A very partial answer - tip_glom labels the glommed taxa with the label of the input which has the largest count. (I have run into trouble mixing up gloms of taxa with raw counts and gloms with normalized counts, making chaos of my taxa-names.) ~I don't know if you can use this as a work-around, for example add a large number to the OTU count of samples with trait=='T' , perform tip_glom, then subtract it again?~
After a little further reading: tip_glom() uses merge_taxa(), which will name merged taxa by the largest counts if an OTU table is supplied, or by the first taxon in the list to be merged if no counts are available. As your example used the tree only, your tips are named by the 1st. tip_glom() has the option of passing arguments to the clustering function which by default is agnes(), which has a parameter trace.lev. If you do tip_glom(as.phylo(GP.tree), h = 0.1, trace.lev = 1) does it tell anything interesting? (I should try for myself)
You could try using https://github.com/grunwaldlab/metacoder
Okay, I've been banging my head against this wall for too long, and it can't just be me having this problem. I'm not even sure this is the appropriate place to ask — if it's not, can someone point me to the correct forum?
I'd like to be able to use PhyloSeq's
tip_glom()
or something similar to group closely related organisms in metabarcoding studies (16S amplicons), and have annotations with particular taxa (or OTUs or RSVs... or Bioinformatically Inferred Taxon Equivalents (BITEs)) propagate to the new tip.As an example, I've got a sample of a microbial community from before some treatment, and one from after. I'd like to annotate those taxa that persist through the treatment. Given the sampling issues etc. with NGS metabarcoding studies, I think I'd like to look at the data at a slightly higher level of relatedness: I could just use SILVA and than apply
tax_glom()
at the genus level, but it turns out more than half of my organisms get<NA>
at the genus level. Makes more sense to tackle it phylogenetically.Let's make an example:
So, you can see that there's some phylogenetic signal going on here: all of the
TRUE
taxa are basically in a single clade (I did that on purpose). But, again, I'd like to simplify things by grouping related taxa — sort of like making OTUs (particularly because I'm working with DADA2 RSVs, and want to group them at about the species level).PhyloSeq has a great tool for this,
tip_glom()
, but I can't use the treeiotreedata
object to make thephyloseq
object; I have to convert back to anape::phylo
object to get back into PhyloSeq. And that means I lose my annotations, sadly. But, let's see what happens:So, we end up compressing things, and it looks about like I'd like. But, the choice of which tip-labels end up as the final label for groups that are agglomerated seems a bit arbitrary — it might be the most basal taxon?
Mapping the agglomerated clades onto the initial tree, we can see that which taxa are glommed together is sometimes clear, and sometimes not. But, you can see that most of the pattern is lost!
So: what I want to do is really simple, and I can't believe that there's not a reasonably easy way to do it. How can I propagate those annotations to the final tip labels?