grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
134 stars 28 forks source link

How to filter non-significant odd named taxa, and only keep the significant odd named taxa? #324

Open catherineel opened 2 years ago

catherineel commented 2 years ago

Hi there!

I've been using metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
to remove odd taxa, but some of the odd named taxa are significant and I would like them to be displayed on the tree.

Is there a way to only display the significant odd named taxa?

zachary-foster commented 2 years ago

What do you mean by significant? Can you give me an example? You can make a list of taxa you want to be displayed no matter what and do this:

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$") | taxon_names %in% my_taxon_name_list, reassign_obs = FALSE)
catherineel commented 2 years ago

Statistical signifiance after correcting for multiple comparisons. This is what I did:

create a new column called wilcox_p_value_p.adjusted to correct for multiple comparison

obj$data$diff_table$wilcox_p_value_p.adjusted <- p.adjust(obj$data$diff_table$wilcox_p_value,
                                                          method = "fdr")

create a new column in diff_table containing log2_median ratio, then mutate this to remove values where wilcox.p.adjusted value is not significant, first create this new column with identical values obj$data$diff_table$log2_median_ratio_wilcox.adjust <- obj$data$diff_table$log2_median_ratio

then mutate this new column to remove non-signif values obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0

Then I created the tree to only display significant taxa after correcting for multiple comparisons at the genus level

obj %>% 
  metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
  metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%  
                   data = "diff_table",
                   node_size = n_obs,
                   node_label = taxon_names,
                   node_color = log2_median_ratio_wilcox.adjust, 
                   node_color_range = diverging_palette(), 
                   node_color_trans = "linear", 
                   node_color_interval = c(-8, 8), 
                   edge_color_interval = c(-8, 8), 
                   node_size_axis_label = "Number of OTUs",
                   node_color_axis_label = "Log2 ratio median proportions",
                   layout = "davidson-harel", 
                   initial_layout = "reingold-tilford", 
                   output_file = "diff tree.pdf")

Let me know if I am doing anything wrong

zachary-foster commented 2 years ago

Ok, I understand now. Thanks for the code! I see that you set the non-significant taxa to 0 but I dont see where you are filtering them out. Either way, if you want to remove and taxa with odd names that are not significant you can do something like:

metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05  & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)
catherineel commented 2 years ago

Thanks for that, but unfortunately I get this error when I replace

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>% with metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05 & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)

Error: TRUE/FALSE vector (length = 1452) must be the same length as the number of taxa (242)

Oh did I do something wrong? I thought I did filter them out by having this line: obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0 as it would filter the non signif ones after mutating and by choosing it to be displayed in the node_colour section? Somehow it looked like it was filtered out in my tree when I did this

set.seed(1) obj %>% metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>% metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
heat_tree_matrix( data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = log2_median_ratio_wilcox.adjust, node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-8, 8), edge_color_interval = c(-8, 8), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 ratio median proportions", layout = "davidson-harel", initial_layout = "reingold-tilford", output_file = "diff tree.pdf")

zachary-foster commented 2 years ago

Can you send me an example data set with associated code that reproduces the issue? Its hard for me to debug without reproducing the error.

catherineel commented 2 years ago

Sorry dumb question, but how do I send an example data?

My original data file is huge as it's a qza file from QIIME2 analysis and I'm not sure what I need to do to it.

zachary-foster commented 2 years ago

No problem, its a common question.

If you can reproduce the error with a subset of the data, you can attach it to this issue to upload them. You can save the needed R objects to a file with readRDS at the point before the example code starts. You can also email the original data at if you dont want it public and its small enough to email.

catherineel commented 2 years ago

Thanks, I just emailed it to you! I'm not sure if I did it correctly