grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
135 stars 28 forks source link

Metacoder continuous variable function for longitudinal dataset #286

Open MADscientist314 opened 4 years ago

MADscientist314 commented 4 years ago

Hi Zach, I have a shotgun metagenomic sequencing dataset consisting of 17 patients that have daily vaginal microbiome samples for a period of 21 days post surgical obstetric fistula repair, and I want to visualize the change in relative abundance across time using metacoder. The compare groups functionality wont work because the amount of time points are too many to adequately display a correlation plot, and I don't think the grouping the samples by week indicates an accurate depiction of the changes, which led me to this idea.

I want to use the rate of change in the relative abundance of the dataset as a node color for a heat tree. For example, if plot the days post operation on the x axis and the relative abundance of the taxa on the y, I can get a linear trendline (y=mx+b) for each species (the abundances are the clade markers hits coming out of metaphlan2 imported as a phyloseq object).

I would like to use the slope of the trendline (x) as the color for the node, which should depict the change in relative abundance overall for all the samples and all the time points. Is there anything in the metacoder or taxa packages that will allow me to accomplish this?

Here is the code I have written so far, and here is the starting phyloseq object that I am using for the analysis. Any guidance that you might have for me would be extremely helpful.

Thanks, Michael

MADscientist314 commented 4 years ago

Update! I converted the phyloseq to a taxmap object, then plotted each tax rank in order to get the slope of a line, for which I used as a heat map color in the heat tree to show the rate of change. This worked beautifully (after some troubleshooting), but the figure is quite busy, so I decided to filter the taxa that only have the greatest rate of change (via the absolute value of the slope) for a downstream figure. Here is where I am stuck. I have a list called filt that contains the taxa of interest, but when I go to filter the taxa in the taxmap object, the results are not the taxa of interest. For instance: here is the original taxmap object

> obj
<Taxmap>
  346 taxa: cs. Acidaminococcaceae, gg. Acinetobacter ... br. Verrucomicrobiales
  346 edges: bg->cs, dd->gg, gg->nb, ab->ac, ac->ak ... bg->ct, ab->aj, br->dh, aj->ax, ax->br

here is the filtering list with the tax_ranks as factors

> filt
                tax_rank      slope abb
1                Bacilli  0.6605342  am
2     Bifidobacteriaceae  0.2054027  bv
3      Bifidobacteriales  0.2054027  az
4     Enterobacteriaceae -0.7340206  db
5      Enterobacteriales -0.7340206  bm
6            Escherichia -0.6560225  fz
7       Escherichia_coli -0.6181949  mo
8             Firmicutes  0.4778686  ae
9    Gammaproteobacteria -0.8955581  au
10           Gardnerella  0.2890978  dq
11 Gardnerella_vaginalis  0.2890978  hb
12      Lactobacillaceae  0.6117220  ch
13       Lactobacillales  0.6520766  bd
14         Lactobacillus  0.6117220  ei
15   Lactobacillus_iners  0.4247242  jj
16            Prevotella  0.3280083  eb
17 Prevotella_intermedia  0.2083886  in
18        Prevotellaceae  0.3389241  ca
19        Proteobacteria -0.8565714  ag

here is the command to filter the taxa obj_filt<-filter_taxa(obj, taxon_names=filt$tax_rank,reassign_taxa = F,supertaxa = T, subtaxa = F, invert = F) and here is the output obj_filt

49 taxa: ac. Actinobacteria, ak. Actinobacteria ... dn. Propionimicrobium, ag. Proteobacteria 49 edges: ab->ac, ac->ak, ak->ay, am->bc, bc->cc, ae->am ... eb->ir, bb->ca, ay->bu, bu->dn, ab->ag and here is the names of the filtered object that dont match the filt list ``` taxon_names(obj_filt) ac ak "Actinobacteria" "Actinobacteria" ay bc "Actinomycetales" "Bacillales" cc am "Bacillales_noname" "Bacilli" ab bb "Bacteria" "Bacteroidales" ad al "Bacteroidetes" "Bacteroidia" dv hs "Barnesiella" "Barnesiella_intestinihominis" ar an "Betaproteobacteria" "Clostridia" be cl "Clostridiales" "Clostridiales_Family_XI_Incertae_Sedis" fy mn "Enterobacter" "Enterobacter_cloacae" db bm "Enterobacteriaceae" "Enterobacteriales" bf ao "Erysipelotrichales" "Erysipelotrichia" eo ae "Finegoldia" "Firmicutes" af cu "Fusobacteria" "Fusobacteriaceae" bh aq "Fusobacteriales" "Fusobacteriia" fq md "Fusobacterium" "Fusobacterium_nucleatum" me au "Fusobacterium_periodonticum" "Gammaproteobacteria" ga ms "Klebsiella" "Klebsiella_unclassified" cn ch "Lachnospiraceae" "Lactobacillaceae" bd ei "Lactobacillales" "Lactobacillus" jh bz "Lactobacillus_crispatus" "Porphyromonadaceae" dy ia "Porphyromonas" "Porphyromonas_somerae" eb il "Prevotella" "Prevotella_copri" ir ca "Prevotella_stercorea" "Prevotellaceae" bu dn "Propionibacteriaceae" "Propionimicrobium" ag "Proteobacteria" ``` Anyways sorry for the book, I'm just really at a loss here as to why the filtering step isnt working properly
zachary-foster commented 4 years ago

Hello, sorry I did not respond to your first post. I am busy preparing to defend my thesis in the next few weeks and I overlooked this issue. Does this do what you want?

filter_taxa(obj, taxon_names %in% filt$tax_rank, supertaxa = TRUE)

Great idea for a plot by the way. A while ago I thought plotting variables in models fit to the data for each taxon would be cool, but I have never done it.