grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
136 stars 28 forks source link

Relative abundance in heat tree #357

Open SimonMorvan opened 1 year ago

SimonMorvan commented 1 year ago

Hello Metacoder devs,

It must be pretty straight forward but I'm having a hard time trying to plot relative abundance of the taxa in the whole dataset instead of OTU count as node_size in the heat tree. I guess I have to change n_obs to something but I couldn't find the right parameter to replace it by.

Have a good day,

Simon

library(phyloseq)
library(metacoder)

data(GlobalPatterns)
# Subsetting the dataset to keep only 2 sample types
GP_sub <- subset_samples(GlobalPatterns, (SampleType=='Ocean' | SampleType=='Soil'))
GP_sub <- prune_taxa(taxa_sums(GP_sub)>0, GP_sub)

# Agglomerating the dataset to Class 
GP_sub_class_glom <- tax_glom(GP_sub,taxrank="Class",NArm = F)

meta_obj <- parse_phyloseq(GP_sub_class_glom) 

meta_obj$data$otu_relab <- calc_obs_props(meta_obj, "otu_table")  

meta_obj$data$tax_relab <- calc_taxon_abund(meta_obj, "otu_relab") 

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "tax_relab",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)

heat_tree(meta_obj,
          node_size = n_obs , 
          node_label = taxon_names,
          node_color = log2_median_ratio, # A column from `obj$data$diff_table`
          node_color_range = diverging_palette(), # The built-in palette for diverging data                 
          node_color_axis_label = "Log2 ratio median proportions",
          repel_labels = TRUE,
          layout = "davidson-harel", # The primary layout algorithm
          initial_layout = "reingold-tilford") # The layout algorithm that initializes node locations
zachary-foster commented 1 year ago

Hello,

You can put in the name of any column in any table in you input column in place of n_obs. This is why log2_median_ratio works. To use total taxon abundance, you need a column in a per-taxon table that has that. To make that, you can use calc_taxon_abund with a grouping variable that uses all samples, so that you only get one column back with a total. That column name can then be used like so:

library(metacoder)
#> This is metacoder verison 0.3.5 (stable)

# Get example data
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Geting a total for all columns 
x$data$tax_abund_total <- calc_taxon_abund(x, "tax_data", cols = hmp_samples$sample_id,
                                           groups = rep("total_count", nrow(hmp_samples)))
#> Summing per-taxon counts from 50 columns in 1 groups for 174 taxa

# Plot total count
heat_tree(x, node_label = taxon_names, node_size = total_count, node_color = total_count)

Created on 2023-10-25 with reprex v2.0.2

SimonMorvan commented 1 year ago

Hello Zachary,

Thanks for your quick answer! I've tried to adapt the code to my parsed_phyloseq object but I still can't get it to work. When i run the calc_taxon_abund() function the result is a tible with 183 rows but then the compare_groups() function returns a diff_table with 147 rows. This results in an error message saying _there are 36 of 183 taxa have NAs for the "nodecolor" option when i want to plot the heat_tree.

Simon

library(phyloseq)
library(metacoder)

data(GlobalPatterns)
# Subsetting the dataset to keep only 2 sample types
GP_sub <- subset_samples(GlobalPatterns, (SampleType=='Ocean' | SampleType=='Soil'))
GP_sub <- prune_taxa(taxa_sums(GP_sub)>0, GP_sub)

# Agglomerating the dataset to Class 
GP_sub_class_glom <- tax_glom(GP_sub,taxrank="Class",NArm = F)

meta_obj <- parse_phyloseq(GP_sub_class_glom) 

meta_obj$data$tax_ab <- calc_taxon_abund(meta_obj, "otu_table", cols = meta_obj$data$sample_data$sample_id,
                         groups = rep("total_count", nrow(meta_obj$data$sample_data)))

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "otu_table",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)

heat_tree(meta_obj,
          node_size = total_count , 
          node_label = taxon_names,
          node_color = log2_median_ratio, # A column from `obj$data$diff_table`
          node_color_range = diverging_palette(), # The built-in palette for diverging data                 
          node_color_axis_label = "Log2 ratio median proportions",
          repel_labels = TRUE,
          layout = "davidson-harel", # The primary layout algorithm
          initial_layout = "reingold-tilford") # The layout algorithm that initializes node locations
zachary-foster commented 1 year ago

Hard to say for sure without being able to run the code with your data myself, but it looks like you are using the OTU table in compare_groups instead of the taxon abundance table? This would compare OTU abundance amoung the groups, not the taxa, which causes attempts to plot data for each taxon to fail. Try this instead:

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "tax_ab",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)
SimonMorvan commented 1 year ago

Hi Zachary, sorry for the late answer. The piece of code you sent did not work. :

meta_obj$data$tax_ab <- calc_taxon_abund(meta_obj, "otu_table", 
                                         cols = meta_obj$data$sample_data$sample_id,
                                         groups = rep("total_count", nrow(meta_obj$data$sample_data)))
#Summing per-taxon counts from 6 columns in 1 groups for 183 taxa

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "tax_ab",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)
#Error : The following 6 column(s) are not in "tax_ab":
#CL3, CC1, SV1, NP2, NP3, NP5

You should be able to run the code I provided as GlobalPatterns is a dataset from the phyloseq package.