Open LChWu opened 5 years ago
Hello @LChWu,
Can you send me a data set and code to reproduce this? I am not sure what the issue is without being able to look at the data. You can email me if you would rather at zacharyfoster1989@gmail.com. Thanks
Hello @zachary-foster , I send you my data and code per mail. Cheers
Hi @LChWu, I found the problem.
You gave compare_groups
the OTU abundance rather than the taxon abundance, so you had multiple OTUs matching a single taxon. So you had info on the OTU differences rather than taxon differences and heat trees can only plot taxon data.
I changed this:
## Calculate relative abundance
obj$data$rel_abd <- calc_obs_props(obj, "otu_counts", other_cols = T)
print(obj)
############### heat tree #####################
## difference of taxon abundances per site
# compare relative abundance per site
obj$data$diff_site <- compare_groups(obj, data = "rel_abd",
cols = mdata$SampleID,
groups = mdata$SiteName)
to this:
## Calculate relative abundance
obj$data$rel_abd <- calc_obs_props(obj, "otu_counts", other_cols = T)
print(obj)
## Calculate per-taxon abundance
obj$data$tax_rel_abd <- calc_taxon_abund(obj, "rel_abd")
print(obj)
############### heat tree #####################
## difference of taxon abundances per site
# compare relative abundance per site
obj$data$diff_site <- compare_groups(obj, data = "tax_rel_abd",
cols = mdata$SampleID,
groups = mdata$SiteName)
and it seemed to work. Let me know if you have questions about his or anything else.
Hello, I have the same error with the compare_groups function as @LChWu. I am trying to create a heat tree matrix using the compare_groups function. However, when I plot the heat tree it returns an error indicating that I have different values for the same taxon.
Here is some of my code:
obj <- parse_tax_data(otu_data,
class_cols = "taxonomy",
class_sep = ";",
class_regex = "^([a-z]{0,1})_{0,2}(.*)$",
class_key = c("tax_rank" = "taxon_rank", "name" = "taxon_name"))
obj$data$class_data <- NULL
names(obj$data) <- "otu_counts"
obj <- filter_taxa(obj, taxon_names != "")
obj <- filter_taxa(obj, taxon_names == "Fungi", subtaxa = TRUE)
## calculate the abundance per taxon from the OTU counts.
obj$data$tax_abund <- calc_taxon_abund(obj, "otu_counts",
cols = sample_data$sample_ID)
##Summing per-taxon counts from 108 columns for 744 taxa
print(obj)
##<Taxmap>
## 744 taxa: aac. Fungi ... cdn. Mortierella_tsukubaensis
## 744 edges: NA->aac, aac->aaf ... bjn->cdn
## 2 data sets:
## otu_counts:
## # A tibble: 607 x 111
## taxon_id OTU_ID AA1 AA2 AA3 AA4
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 aac OTU2 173 0 545 58
## 2 aaf OTU3 997 0 2577 808
## 3 art OTU4 0 0 0 0
## # ... with 604 more rows, and 105 more
## # variables: AA5 <dbl>, AA6 <dbl>,
## # AA7 <dbl>, AA8 <dbl>, AA9 <dbl>,
## # AA10 <dbl>, AA11 <dbl>, AA12 <dbl>,
## # AB1 <dbl>, AB2 <dbl>, ...
## tax_abund:
## # A tibble: 744 x 109
## taxon_id AA1 AA2 AA3 AA4 AA5
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 aac 50919 501 100902 60211 54678
## 2 aaf 6430 0 26293 25793 23953
## 3 aag 39191 482 59539 27125 24979
## # ... with 741 more rows, and 103 more
## # variables: AA6 <dbl>, AA7 <dbl>,
## # AA8 <dbl>, AA9 <dbl>, AA10 <dbl>,
## # AA11 <dbl>, AA12 <dbl>, AB1 <dbl>,
## # AB2 <dbl>, AB3 <dbl>, ...
## 0 functions:
## For each taxon, at every rank, compare groups of counts.
obj$data$diff_table <- compare_groups(obj, data = "tax_abund",
cols = sample_data$sample_ID,
groups = sample_data$site)
print(obj$data$diff_table)
## # A tibble: 2,232 x 7
## taxon_id treatment_1 treatment_2 log2_median_rat~
## <chr> <chr> <chr> <dbl>
## 1 aac Clan WindyHill 0.612
## 2 aaf Clan WindyHill -0.149
## 3 aag Clan WindyHill 0.873
## 4 aah Clan WindyHill -1.83
## 5 aai Clan WindyHill 0
## 6 aaj Clan WindyHill 0
## 7 aak Clan WindyHill 0
## 8 aal Clan WindyHill 0
## 9 aam Clan WindyHill 2.33
## 10 aan Clan WindyHill 0.730
## # ... with 2,222 more rows, and 3 more variables:
## # median_diff <dbl>, mean_diff <dbl>, wilcox_p_value <dbl>
## To correct for multiple comparisions (false discovery rate, FDR).
obj <- mutate_obs(obj, "diff_table",
wilcox_p_value = p.adjust(wilcox_p_value, method = "fdr"))
## Set any differences that are not significant to 0.
obj$data$diff_table$log2_median_ratio[obj$data$diff_table$wilcox_p_value > 0.05] <- 0
## Differential heat trees - Plot: Fungi-order
obj %>%
filter_taxa(taxon_ranks == "o", supertaxa = TRUE) %>%
mutate_obs("cleaned_names", gsub(taxon_names, pattern = "\\[|\\]", replacement = "")) %>%
filter_taxa(grepl(cleaned_names, pattern = "^[a-zA-Z]+$")) %>%
heat_tree_matrix(data = "diff_table",
node_label = cleaned_names,
node_size = n_obs,
node_color = log2_median_ratio,
node_color_trans = "linear",
node_color_interval = c(-3, 3),
edge_color_interval = c(-3, 3),
node_color_range = diverging_palette(),
node_size_axis_label = "OTUs number",
node_color_axis_label = "Median counts Log2 ratio",
layout = "da", initial_layout = "re",
key_size = 0.67,
seed = 2,
output_file = "Fig_comparation_sites.jpg")
## Adding a new "character" vector of length 144.
## Error: All pairs being compared should have one value per taxon (130). The following do not:
## Clan vs. WindyHill (744), Clan vs. Dukuduku (744), WindyHill vs. Dukuduku (744)
I think the problem could be that I have a different number of rows in obj$data$diff_table than there are taxa in obj$data$tax_abund. However, I ran the same script with the same data a year ago and everything ran fine, not sure what is the problem right now...
Any help would be great! Thank you very much! Maria
Hello Maria,
Can you email me that data used in that code so I can tests it? My email is zacharyfoster1989@gmail.com
. Thanks
Thanks for the data! I think I found the problem. You need to add reassign_obs = FALSE
to the filter_taxa
calls here:
obj %>%
filter_taxa(taxon_ranks == "o", supertaxa = TRUE) %>%
mutate_obs("cleaned_names", gsub(taxon_names, pattern = "\\[|\\]", replacement = "")) %>%
filter_taxa(grepl(cleaned_names, pattern = "^[a-zA-Z]+$")) %>%
heat_tree_matrix( ...
It should be
obj %>%
filter_taxa(taxon_ranks == "o", supertaxa = TRUE, reassign_obs = FALSE) %>%
mutate_obs("cleaned_names", gsub(taxon_names, pattern = "\\[|\\]", replacement = "")) %>%
filter_taxa(grepl(cleaned_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
heat_tree_matrix(data = "diff_table",
Without reassign_obs = FALSE
, no rows will be removed from any of the tables in obj
, including diff_table
, when taxa are removed. If you run:
obj %>%
filter_taxa(taxon_ranks == "o", supertaxa = TRUE) %>% #plot the taxa to the rank "order"
mutate_obs("cleaned_names", gsub(taxon_names, pattern = "\\[|\\]", replacement = "")) %>%
filter_taxa(grepl(cleaned_names, pattern = "^[a-zA-Z]+$"))
you can see that there 2232 rows in diff_table
but only 130 taxa. There should be 3 * 130 = 390 rows, since there are three comparisons. That is because instead of removing rows in diff_table
when taxa were filtered out, filter_taxa
reassigned the rows to supertaxa that were not filtered out. For some things, like abundance matrices, this is desirable, but not in this case. Adding reassign_obs = FALSE
means any rows assigned to removed taxa are also removed from all the data in obj
.
Does that fix the problem?
Hello @zachary-foster,
I added reassign_obs = FALSE
but it was not working. Then, I realised it also is necessary to include drop_obs = TRUE
to remove the rows from any table.
It should be:
obj %>%
filter_taxa(taxon_ranks == "o", supertaxa = TRUE, reassign_obs = FALSE, drop_obs =TRUE) %>%
Thank you so much for your help!
Hello I have the same error but none of the solutions mentioned above seems to work. any chance some can help me? I can share the data and elaborate via mp. my goal is to compare microbiome composition between 2 groups (stool vs tissues) and ideally between each sample site (5 tissue types + stool)
I am starting with phyloseq object
phyloseq-class experiment-level object
otu_table() OTU Table: [ 326 taxa and 71 samples ]
sample_data() Sample Data: [ 71 samples by 25 sample variables ]
tax_table() Taxonomy Table: [ 326 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 326 tips and 325 internal nodes ]
that I convert :
psmat<- parse_phyloseq(ps)
then
psmat$data$rel_abd <- calc_obs_props(psmat, "otu_table", other_cols = T)
psmat$data$tax_rel_abd <- calc_taxon_abund(psmat, "rel_abd")
psmat$data$diff_site <- compare_groups(psmat, data = "tax_rel_abd",
cols = meta2$SampleID,
groups = meta2$bin)
psmat <- mutate_obs(psmat, "diff_site",
wilcox_p_value = p.adjust(wilcox_p_value, method = "fdr"))
psmat$data$diff_table$log2_median_ratio[psmat$data$diff_table$wilcox_p_value > 0.05] <- 0
but it failed here:
psmat %>%
metacoder::filter_taxa(taxon_ranks == "Family", supertaxa = TRUE) %>%
mutate_obs("cleaned_names", gsub(taxon_names, pattern = "\\[|\\]", replacement = "")) %>%
metacoder::filter_taxa(grepl(cleaned_names, pattern = "^[a-zA-Z]+$")) %>%
heat_tree_matrix(dataset = "diff_site",
node_label = cleaned_names,
node_size = n_obs, # number of OTUs
node_color = log2_median_ratio, # difference between groups
node_color_trans = "linear",
node_color_interval = c(-3, 3), # symmetric interval
edge_color_interval = c(-3, 3), # symmetric interval
node_color_range = diverging_palette(), # diverging colors
node_size_axis_label = "OTU count",
node_color_axis_label = "Log 2 ratio of median counts",
layout = "da", initial_layout = "re",
key_size = 0.67,
seed = 2)
ERROR: Error: All pairs being compared should have one value per taxon (187). The following do not: Tissues vs. Stool (213)
I tried reassign_obs = FALSE, drop_obs =TRUE
Thank you so much for your help!
Try adding reassign_obs = FALSE
to your metacoder::filter_taxa
calls. Since you have one value per taxon, you don't want values for the removed genus being reassigned to their families, which is the default behavior.
thank you! I tihnk this is working. let me do some checks.
it is working but I would need some more advices... I want to compare groups (ideally more than 2 but starting with 2). I prepared my data:
psmat$data$diff_table <- compare_groups(psmat, data = "tax_abund",
cols = meta2$SampleID,
groups = meta2$bin)
psmat$data$diff_site <- compare_groups(psmat, data = "tax_rel_abd",
cols = meta2$SampleID,
groups = meta2$bin)
diff_table looks like this:
but nothing is showing up in the heat tree when running this (I tried at the family level or genus)
plot2=psmat %>%
mutate_obs("cleaned_names", gsub(taxon_names, pattern = "\\[|\\]", replacement = "")) %>%
metacoder::filter_taxa(taxon_ranks == "Family", supertaxa = TRUE,grepl(cleaned_names, pattern = "^[a-zA-Z]+$"),reassign_obs = FALSE) %>%
# OR metacoder::filter_taxa(grepl(cleaned_names, pattern = "^[a-zA-Z]+$"),reassign_obs = FALSE) %>%
heat_tree_matrix(dataset = "diff_site", # also tried with "diff_table"
node_label = cleaned_names,
node_size = n_obs, # number of OTUs
node_color = log2_median_ratio, # difference between groups
node_color_trans = "linear",
node_color_interval = c(-3, 3), # symmetric interval
edge_color_interval = c(-3, 3), # symmetric interval
node_color_range = diverging_palette(), # diverging colors
node_size_axis_label = "OTU count",
node_color_axis_label = "Log 2 ratio of median counts",
layout = "da", initial_layout = "re",
key_size = 0.67,
seed = 2)
Happy to share data via mp if it can help thank you!
the head of the diff table looks like this
it works. thank you!
No problem! Did you solve your problem?
Yes. I was trying to apply a heat_matrix to a comparison between 2 groups only. Works nicely now. Thank you!
Hello, I have an issue with the
compare_groups
function. I want to create a heat tree matrix for which I usecompare_groups
on relative abundance counts created withcalc_obs_props
. But it will return multiple different values for the same taxon for the comparison between two treatments. Here is some of my code:You can see in line 15 and 16 that they have the same taxon ID but comparing the same two sites. There are four sites in total which are compared. When I try to calculate the
heat_tree_matrix
it will give me the errorError: All pairs being compared should have one value per taxon. The following do not:
even though I usedreassign_obs = c(diff_site = FALSE)
infilter_taxa
.I looked a lot around and tried to find similar issues from other people, but I was not succesful so far. I don't think the problem is that it recognizes more than four different sites, because I can e.g. calculate the relative abundance for the four sites seperatly and it will only give me these four.
I hope you can help me. Cheers!