ghost commented 4 years ago

Hello,

Thanks for this great package. It is extremely useful for my dataset.

I am running into a few issues that I’m hoping you can help with. The main issue is I am unable to generate a heat_tree with relative abundance data. With the taxmat object I ran these commands:

obj$data$tax_abund <- calc_taxon_abund(obj, data = "tax_data") obj$data$otu_props <- calc_obs_props(obj, "tax_abund", other_cols = TRUE)

Looking at obj$data$otu_props I have relative abundance for each OTU within each sample. However, when I try to generate a heat_tree I am unable to. What exactly should I be putting in for node_size and node_color? I’ve tried otu_props, obj$data$otu_props, and several other things but none work. This is the most common error I reveive:

Error in check_element_length(c("node_size", "edge_size", "node_label_size", : Length of argument'node_size' must be a factor of the length of 'taxon_id'

set.seed(100) pALL <- obj %>% taxa::filter_taxa(grepl(pattern = "^[a-zA-Z]+$", taxon_names)) %>% taxa::filter_taxa(taxon_ranks == "c", supertaxa = TRUE) %>% heat_tree(node_label = taxon_names, node_size = ???, node_color = ???, node_color_axis_label = "OTU count", layout = "davidson-harel", initial_layout = "reingold-tilford")

Any help would be greatly appreciated. Thank you!

zachary-foster commented 4 years ago

Thanks!

You should use calc_obs_props before calc_taxon_abund, but I think the problem you are having is caused by filter_taxa needing the reassign_obs = FALSE. Otherwise, taxa are filtered out, but the rows of tax_abund are reassigned (e.g. a removed genus assigned to a remaining class) to remaining supertaxa, and so there are more rows than taxa.

Also, the abundance of what? Here is an example of the relative abundance of a group of samples, but you can also do individual samples.

library(metacoder)
#> Loading required package: taxa
#> This is metacoder verison 0.3.3 (stable)

x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

x$data$otu_props <- calc_obs_props(x, "tax_data", cols = hmp_samples$sample_id, groups = hmp_samples$body_site)
#> Calculating proportions from counts for 50 columns in 5 groups for 1000 observations.
x$data$tax_abund <- calc_taxon_abund(x, data = "otu_props")
#> No `cols` specified, so using all numeric columns:
#>    Nose, Saliva, Skin, Stool, Throat
#> Summing per-taxon counts from 5 columns for 174 taxa

to_plot <- x %>%
  taxa::filter_taxa(grepl(pattern = "^[a-zA-Z]+$", taxon_names), reassign_obs = FALSE) %>%
  taxa::filter_taxa(taxon_ranks == "c", supertaxa = TRUE, reassign_obs = FALSE) 

heat_tree(to_plot, 
          node_label = taxon_names,
          node_size = to_plot$data$tax_abund[['Nose']],
          node_color = to_plot$data$tax_abund[['Nose']],
          node_color_axis_label = "OTU count",
          layout = "davidson-harel", initial_layout = "reingold-tilford")

^{Created on 2020-03-02 by the reprex package (v0.3.0)}

ghost commented 4 years ago

Thanks for the quick response. First, I'm looking to generate a heat tree with the relative abundance of each OTU/taxonomic level for my whole dataset. Second, within my dataset I have three locations and I would then like to have a relative abundance tree for each location. When I use the code you provided I receive this error Error in check_element_length(c("node_size", "edge_size", "node_label_size", : Length of argument'node_size' must be a factor of the length of 'taxon_id'

The issue may be that when I look at the data for tax_abund it doesn't make sense. Multiple values are 1 and values within a single sample sum to greater than 1. for Example: taxon_id Sample1 Sample2 Sample3 aab 1 1 1
aae 0.363 0.498 0.433 aaf 0.332 0.178 0.282 aag 0.164 0.0535 0.0465

Here's the code I used y = parse_tax_data(otu_data, class_cols = "Taxonomy", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$")

y$data$otu_props <- calc_obs_props(y, "tax_data", cols = sample_data$SampleID, groups = sample_data$Location)

> Calculating proportions from counts for 50 columns in 5 groups for 1000 observations.

y$data$tax_abund <- calc_taxon_abund(y, data = "otu_props")

> No `cols` specified, so using all numeric columns:

> Nose, Saliva, Skin, Stool, Throat

> Summing per-taxon counts from 5 columns for 174 taxa

to_plot <- y %>% taxa::filter_taxa(grepl(pattern = "^[a-zA-Z]+$", taxon_names), reassign_obs = FALSE) %>% taxa::filter_taxa(taxon_ranks == "c", supertaxa = TRUE, reassign_obs = FALSE)

heat_tree(to_plot, node_label = taxon_names, node_size = to_plot$data$otu_props[['Loc_A']], node_color = to_plot$data$otu_props[['Loc_A']], node_color_axis_label = "Relative Abundance", layout = "davidson-harel", initial_layout = "reingold-tilford")

Also, why should I use use calc_obs_props before calc_taxon_abund? Just curious because I have been able to subset my data by location and receive these outputs:

obj$data$tax_abund taxon_id Loc_A Loc_B Loc_C
1 aab 2826493 1598547 652391 2 aac 9070 709 263 3 aad 8554 5 0 obj$data$otu_props A tibble: 2,589 x 4 taxon_id Loc_A Loc_B Loc_C 1 aab 0.170 0.166 0.177 2 aac 0.000545 0.0000738 0.0000713 3 aad 0.000514 0.000000521 0

When I do this I am able to generate heat_trees by each location that show OTU count for the nodes and branch sizes but I am unable to generate trees based on relative abundance.

grunwaldlab / metacoder

relative abundance and heat_tree #281

> Calculating proportions from counts for 50 columns in 5 groups for 1000 observations.

> No `cols` specified, so using all numeric columns:

> Nose, Saliva, Skin, Stool, Throat

> Summing per-taxon counts from 5 columns for 174 taxa

grunwaldlab / metacoder

relative abundance and heat_tree #281

> Calculating proportions from counts for 50 columns in 5 groups for 1000 observations.

> No cols specified, so using all numeric columns:

> Nose, Saliva, Skin, Stool, Throat

> Summing per-taxon counts from 5 columns for 174 taxa

> No `cols` specified, so using all numeric columns: