Closed dougwyu closed 6 years ago
Hello @dougwyu, sure!
However, the interpretation is dependent on the code you used to make the figure. Did you follow the example on the GitHub readme?
Did you set any differences to 0 that were not significant? If so, this could explain why there are sometimes genera that are not different although their orders are since the variation of read depths could be different between the two ranks.
I'm not sure how to interpret this. Is it because all the cropland samples consistently have more or fewer of specific OTUs compared to all the forest samples, whereas the forest samples within a forest type differ in which (say, Araneae) OTUs they have, but the different forest types differ in the overall number of Araneae OTUs?
If you followed the example, then the colors represent differences in read depth, not differences between the number of OTUs. You could do that if you want though.
I cant say much more without more specifics about how you did the analysis.
By the way, the development version will force the matrix to be square, so it will look nicer.
Hi Zachary,
sorry for the delay. My student ran this code, using the code in your tutorial. I wanted to look at the code again and see if i could figure out myself, but i haven't been able to.
we convert our OTU tables to presence/absence (1/0) data before all community analyses, so if the code below is for read count, it will automatically be for OTU count.
doug
library(metacoder) library(tibble)
mbc_otus <- read.table("data/for_metacoder/2014mbc_otus_655_metacoder.txt", header = T, sep = "\t") mbc_samples <- read.table("data/for_metacoder/2014mbc_samples_655_metacoder.txt", header = T, sep = "\t")
str(mbc_otus) str(mbc_samples)
mbc_otus$OTU_id <- as.character(mbc_otus$OTU_id) mbc_otus$lineage <- as.character(mbc_otus$lineage) mbc_samples$Site <- as.character(mbc_samples$Site) mbc_samples$Habitat <- as.character(mbc_samples$Habitat)
mbc_otus <- as_tibble(mbc_otus) mbc_samples <- as_tibble(mbc_samples) print(mbc_otus) print(mbc_samples)
obj <- parse_tax_data(mbc_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "info", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$")
print(obj) # or click on the object in the Environment pane
obj$data$tax_data <- calc_obs_props(obj, "tax_data")
print(obj)
obj$data$tax_abund <- calc_taxon_abund(obj, "tax_data", cols = mbc_samples$Site) print(obj)
obj$data$tax_occ <- calc_n_samples(obj, "tax_abund", groups = mbc_samples$Habitat) print(obj)
heat_tree(obj, node_label = obj$taxon_names(), node_size = obj$n_obs(), node_color = obj$data$tax_occ$NF, node_size_axis_label = "OTU count", node_color_axis_label = "Samples with reads")
obj$data$diff_table <- compare_groups(obj, dataset = "tax_abund", cols = mbc_samples$Site, groups = mbc_samples$Habitat) print(obj$data$diff_table)
heat_tree(obj, node_label = taxon_names, node_size = n_obs, node_color = log2_median_ratio, node_color_interval = c(-2, 2), node_color_range = c("cyan", "gray", "tan"), node_size_axis_label = "OTU count", node_color_axis_label = "Log 2 ratio of median proportions")
obj$data$diff_table$wilcox_p_value <- p.adjust(obj$data$diff_table$wilcox_p_value, method = "fdr") hist(obj$data$diff_table$wilcox_p_value)
obj$data$diff_table <- compare_groups(obj, dataset = "tax_abund", cols = mbc_samples$Site, groups = mbc_samples$Habitat) print(obj$data$diff_table) heat_tree_matrix(obj, dataset = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = log2_median_ratio, node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 ratio median proportions")
so if the code below is for read count, it will automatically be for OTU count.
Yea, thats right. That means the color is the log 2 ratio of median proportions of OTUs rather than reads.
I think it has something to do with you converting counts to presence/absence and calculating proportion of OTUs for each taxon. Since everything is 0 or 1, there are only a few values possible for the proportion of OTUs for each taxon, so the median does not always capture the difference. For example:
> x1 = c(.1, .1, .1, .1, 0, 0, 0)
> x2 = c(.5, .5, .5, .1, 0, 0, 0)
> median(x1) == median(x2)
[1] TRUE
> mean(x1)
[1] 0.05714286
> mean(x2)
[1] 0.2285714
Perhaps you can use mean instead of median? You can see how to change what compare_groups
returns in the docs ?compare_groups
. Look at the doc for the func
option. Let me know if you have questions.
Make sense?
closing due to inactivity. Let me know if you want to discuss this more
Hi,
could you please help me interpret the attached image? The sample codes are different habitats: CL = cropland (i.e. farms), and the other codes are different forest types. we have metabarcoded insect samples in these habitats and assigned taxonomies using RDP Classifier (0.8 cutoff).
Not surprisingly, cropland is compositionally very different from the forests, and in the differential heat trees comparing CL to the other habitats, the compositional differences are traced to down to genus level (essentially individual OTUs but they don't have species dets).
in contrast, compositional differences between the forest types are mostly only traced down to order level. E.g. NF and BB differ in the number of Araneae and Hemiptera OTUs and in the number of Mycetophila OTUs.
I'm not sure how to interpret this. Is it because all the cropland samples consistently have more or fewer of specific OTUs compared to all the forest samples, whereas the forest samples within a forest type differ in which (say, Araneae) OTUs they have, but the different forest types differ in the overall number of Araneae OTUs?