grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
134 stars 28 forks source link

Adding Relative Abundance as node size and color #327

Closed DarynaP closed 2 years ago

DarynaP commented 2 years ago

Hey,

I am realy new to this program..

In my table I already have the relative abundances

<Taxmap>
  53 taxa: ab. Bacteria, ac. Proteobacteria, ad.  Proteobacteria ... ca.  Bradyrhizobium, cb.  Microbacterium
  53 edges: NA->ab, ab->ac, ab->ad, ab->ae, ab->af, ab->ag, ab->ah ... bk->bw, bl->bx, bm->by, bn->bz, bp->ca, bw->cb
  1 data sets:
    tax_data:
      # A tibble: 18 x 4
        taxon_id Bacteria                                            Linage                                                                        Abundance
        <chr>    <chr>                      <chr>                                                                  <dbl>
      1 bq       Variovorax sp. txid2126319 Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales; Co~      3.61
      2 br       Stagnimonas aquatica       Bacteria; Proteobacteria; Gammaproteobacteria; Nevskiales; Sinoba~      3.69
      3 bs       Sphingomonas sp.           Bacteria; Proteobacteria; Alphaproteobacteria; Sphingomonadales; ~      1.72
      # ... with 15 more rows
  0 functions:

I would like to use those values in node_size and node_color... How I do that ? It is possible?

Thank you

zachary-foster commented 2 years ago

Sorry for the slow reply; I was on vacation. I am not sure I understand how your data is structured from that output. What is the name of the column with the abundance data you want to plot? (colnames(x$data$tax_data))

DarynaP commented 2 years ago

Thank you for your reply,

So, my input is not a fasta file but already a excel with the data treated, in a table. I have one column with the linage and another with the relative abundance of the respective linage. Then a did the tree2 = parse_tax_data(Heatree, class_cols = "Linage", class_sep = ";") of this table. My question is if I can use the values that I already have in my table with the relative abundances. The column name is 'Abundance'

zachary-foster commented 2 years ago

Yes you can use those values, but first you have to convert those to per-taxon values. The heat_tree function plots data associated with taxa, but your abundance data is associated with observations (or isolates/OTU/sASVs, I am not sure what the rows of your data represent). You fist have to convert those per-observation abundances to per-taxon abundances. This can be done in a few ways, but one way is to sum the abundances of all rows that correspond to a taxon using calc_taxon_abund. Here is an example with the included data with the package:

library(metacoder)
#> This is metacoder verison 0.3.5 (stable)

x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

x$data$taxon_counts <- calc_taxon_abund(x, data = "tax_data")
#> No `cols` specified, so using all numeric columns:
#>    700035949, 700097855, 700100489 ... 700102367, 700101358
#> Summing per-taxon counts from 50 columns for 174 taxa
heat_tree(x, node_label = taxon_names, node_size = x$data$taxon_counts$`700035949`, node_color = x$data$taxon_counts$`700035949`)

Created on 2021-12-03 by the reprex package (v2.0.1)

I had to use "x$data$taxon_counts$700035949" becuase the sample names in this data are numbers (not ideal really). For a non-numeric sample name that is unique in the object, I could have left off the "x$data$taxon_counts$".

Your data might work with code like this:

library(metacoder)
x$data$taxon_abund <- calc_taxon_abund(x, data = "tax_data", out_names = 'tax_abund')
heat_tree(x, node_label = taxon_names, node_size = tax_abund, node_color = tax_abund)

Make sense?

DarynaP commented 2 years ago

Thank you so much. It make perfect sense and worked very well with my data.