grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
134 stars 28 forks source link

How to filter non-significant odd named taxa, and only keep the significant odd named taxa? #324

Open catherineel opened 2 years ago

catherineel commented 2 years ago

Hi there!

I've been using metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
to remove odd taxa, but some of the odd named taxa are significant and I would like them to be displayed on the tree.

Is there a way to only display the significant odd named taxa?

zachary-foster commented 2 years ago

What do you mean by significant? Can you give me an example? You can make a list of taxa you want to be displayed no matter what and do this:

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$") | taxon_names %in% my_taxon_name_list, reassign_obs = FALSE)
catherineel commented 2 years ago

Statistical signifiance after correcting for multiple comparisons. This is what I did:

create a new column called wilcox_p_value_p.adjusted to correct for multiple comparison

obj$data$diff_table$wilcox_p_value_p.adjusted <- p.adjust(obj$data$diff_table$wilcox_p_value,
                                                          method = "fdr")

create a new column in diff_table containing log2_median ratio, then mutate this to remove values where wilcox.p.adjusted value is not significant, first create this new column with identical values obj$data$diff_table$log2_median_ratio_wilcox.adjust <- obj$data$diff_table$log2_median_ratio

then mutate this new column to remove non-signif values obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0

Then I created the tree to only display significant taxa after correcting for multiple comparisons at the genus level

set.seed(1)
obj %>% 
  metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
  metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%  
  heat_tree_matrix(
                   data = "diff_table",
                   node_size = n_obs,
                   node_label = taxon_names,
                   node_color = log2_median_ratio_wilcox.adjust, 
                   node_color_range = diverging_palette(), 
                   node_color_trans = "linear", 
                   node_color_interval = c(-8, 8), 
                   edge_color_interval = c(-8, 8), 
                   node_size_axis_label = "Number of OTUs",
                   node_color_axis_label = "Log2 ratio median proportions",
                   layout = "davidson-harel", 
                   initial_layout = "reingold-tilford", 
                   output_file = "diff tree.pdf")

Let me know if I am doing anything wrong

zachary-foster commented 2 years ago

Ok, I understand now. Thanks for the code! I see that you set the non-significant taxa to 0 but I dont see where you are filtering them out. Either way, if you want to remove and taxa with odd names that are not significant you can do something like:

metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05  & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)
catherineel commented 2 years ago

Thanks for that, but unfortunately I get this error when I replace

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>% with metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05 & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)

Error: TRUE/FALSE vector (length = 1452) must be the same length as the number of taxa (242)

Oh did I do something wrong? I thought I did filter them out by having this line: obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0 as it would filter the non signif ones after mutating and by choosing it to be displayed in the node_colour section? Somehow it looked like it was filtered out in my tree when I did this

set.seed(1) obj %>% metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>% metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
heat_tree_matrix( data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = log2_median_ratio_wilcox.adjust, node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-8, 8), edge_color_interval = c(-8, 8), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 ratio median proportions", layout = "davidson-harel", initial_layout = "reingold-tilford", output_file = "diff tree.pdf")

zachary-foster commented 2 years ago

Can you send me an example data set with associated code that reproduces the issue? Its hard for me to debug without reproducing the error.

catherineel commented 2 years ago

Sorry dumb question, but how do I send an example data?

My original data file is huge as it's a qza file from QIIME2 analysis and I'm not sure what I need to do to it.

zachary-foster commented 2 years ago

No problem, its a common question.

If you can reproduce the error with a subset of the data, you can attach it to this issue to upload them. You can save the needed R objects to a file with readRDS at the point before the example code starts. You can also email the original data at zacharyfoster1989@gmail.com if you dont want it public and its small enough to email.

catherineel commented 2 years ago

Thanks, I just emailed it to you! I'm not sure if I did it correctly