sghignone commented 3 years ago

Hi again, first of all, this package is amazing! I'm dealing with heat_tree_matrix on data obtained with calc_diff_abund_deseq2; what should I do on the diff_table in order to
keep in consideration only those taxa with Log2FoldChange > 2.5 (upregulated) and < -2.5 (downregulated)?

Many thanks in advance! Cheers Stefano

zachary-foster commented 3 years ago

Hi Stefano,

Thanks!

Since heat tree matrix supports multiple comparisons you have to consider how to handle cases where a taxon has higher than a 2.5 fold change in one comparison but not another. The example below includes taxa if they are > 2.5 in any comparison.

Try this:

library(metacoder)

# Parse data for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, data = "tax_data", cols = hmp_samples$sample_id)

# Calculate difference between groups
x$data$diff_table <- calc_diff_abund_deseq2(x, data = "tax_table",
                                            cols = hmp_samples$sample_id,
                                            groups = hmp_samples$body_site)

# Remove taxa with only small differences 
per_taxon_fold_changes <- obs(x, data = 'diff_table', value = 'log2FoldChange')
per_taxon_max_change <- unlist(lapply(per_taxon_fold_changes, function(tax_changes) max(abs(tax_changes))))
x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))

I saw you posted this to the google groups and in that question you asked about filtering out NAs as well. You can use the same technique, but again you have to consider if a NA in one comparison should cause the taxon to be removed from all comparisons.

sghignone commented 3 years ago

Thanks for the explanation Zachary! In the case of only two comparisons, I can identify my differentially abundant taxa as follows, i.e. padj > 0.05 and Log2FoldChange > 2.5 (upregulated) and < -2.5 (downregulated):

signALL<-obj.physeq.TvS$data$tax_T_v_S %>% filter((padj > 0.05 & log2FoldChange < -2.5) | (padj > 0.05 & log2FoldChange > 2.5))

I was wondering how to implement this filter in a filter_taxa function, so to pass to heat_tree only those taxa really significant (and just not plotting them out of the total)

zachary-foster commented 3 years ago

To take into account the p-values, you could modify the code like so:

library(metacoder)

# Parse data for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, data = "tax_data", cols = hmp_samples$sample_id)

# Calculate difference between groups
x$data$diff_table <- calc_diff_abund_deseq2(x, data = "tax_table",
                                            cols = hmp_samples$sample_id,
                                            groups = hmp_samples$body_site)

# Remove taxa with only small differences 
per_taxon_fold_changes <- obs(x, data = 'diff_table', value = 'log2FoldChange')
per_taxon_p_values <- obs(x, data = 'diff_table', value = 'pvalue')
per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) {
  if (all(per_taxon_p_values[[i]] > 0.05)) {
    return(0)
  } else {
    return(max(abs(per_taxon_fold_changes[[i]][per_taxon_p_values[[i]] <= 0.05])))
  }
}))
x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))

# Plot results (might take a few minutes)
heat_tree_matrix(x,
                 data = "diff_table",
                 node_size = n_obs,
                 node_label = taxon_names,
                 node_color = ifelse(is.na(padj) | padj > 0.05, 0, log2FoldChange),
                 node_color_range = diverging_palette(),
                 node_color_trans = "linear",
                 node_color_interval = c(-3, 3),
                 edge_color_interval = c(-3, 3),
                 node_size_axis_label = "Number of OTUs",
                 node_color_axis_label = "Log2 fold change")

This would preserve only taxa that have a change (positive or negative) of greater than 2.5 associated with a p value less than .05 in at least one comparison. Does this do what you want?

BrettLoman commented 3 years ago

Hello!

I found this discussion very helpful and I am trying to mimic these steps to create a similar heat tree. I think the problem that I am running into is that my taxa have zero values for at least one one of my samples (but I could be wrong - very amateur at R), resulting in the following message when I run the "calc_diff_abund_deseq2" command:

converting counts to integer mode estimating size factors Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, : every gene contains at least one zero, cannot compute log geometric means

I read on another discussion board that this can be avoided by adding pseudo-counts, or adding +1, to each value in the otu table. Is there a quick way that you can think to do this? I imagine this will work out fine, as this will not change which taxa are significantly different.

Thanks in advance! Brett

zachary-foster commented 3 years ago

I have not had that problem personally that I remember, but I imagine you have 3 options: 1) filter the input data in some way to remove low-abundance samples. Do you have a negative control with all 0s for example? 2) Add some small amount like 1 to all counts like you said. This is pretty easy to do. It depends a bit on your data structure, but might be as simple as my_table[my_cols] <- my_table[my_cols] + 1, where my_cols contains the subset of columns with counts (could be a column in your sample metadata table, if you have one). 3) Use a different method for differential taxon abundance.

BrettLoman commented 3 years ago

Thanks for the quick response!

I had initially followed the Metacoder workshop tutorial (only using my own data) so the names of my files and the structure are similar. I already removed all of the taxa that had no values (e.g. 0 for all samples) and then tried this:

obj$data$otu_counts <- obj$data$otu_counts + 1 Error in FUN(left, right) : non-numeric argument to binary operator

Sorry - really no clue what I'm doing here... but I really appreciate the help!

zachary-foster commented 3 years ago

No problem!

The way that you would do it assuming you have data structured like the workshop would be something like:

obj$data$otu_counts[sample_data$SampleID] <- obj$data$otu_counts[sample_data$SampleID] + 1

Basically, if you add 1 to a table (or matrix), R guesses that you want to add 1 to all the values. However, that might not make sense if some of the columns are not numeric. So what you need to do is just replace the data for the numeric columns with the incremented version, and leave the non-numeric columns alone. Here sample_data$SampleID is a column in the sample_data table that corresponds to the column names (sample names) in the abundance matrix.

Referring to the columns for index would also work, but is less readable (e.g. obj$data$otu_counts[1:12345] <- obj$data$otu_counts[1:12345] + 1)

BrettLoman commented 3 years ago

Amazing - it worked! In the home stretch now :)

Unfortunately when I try to generate the heat tree, now this happens:

> ##Removes taxa with small/not significant differences##
> per_taxon_fold_changes <- obs(obj, data = 'diff_table', value = 'log2FoldChange')
> per_taxon_p_values <- obs(obj, data = 'diff_table', value = 'pvalue')
> per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) {
+   if (all(per_taxon_p_values[[i]] > 0.05)) {
+     return(0)
+   } else {
+     return(max(abs(per_taxon_fold_changes[[i]][per_taxon_p_values[[i]] <= 0.05])))
+   }
+ }))
> obj <- filter_taxa(obj, per_taxon_max_change > 1.2, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))
> ##Builds heat tree##
> heat_tree_matrix(obj,
+                  data = "diff_table",
+                  node_size = n_obs,
+                  node_label = taxon_names,
+                  node_color = ifelse(is.na(padj) | padj > 0.05, 0, log2FoldChange),
+                  node_color_range = diverging_palette(),
+                  node_color_trans = "linear",
+                  node_color_interval = c(-3, 3),
+                  edge_color_interval = c(-3, 3),)
Error: All pairs being compared should have one value per taxon (142). The following do not:
   Vehicle vs. Chemo (815)

Did I use the wrong part of the data as input...? Maybe I am entering multiple values on accident...

Thanks again!

zachary-foster commented 3 years ago

That error happens when the data being plotted is not one value per taxon for at least one comparison. Its saying that there are 142 taxa to be plotted, but "diff_table" has 815 rows for one comparison. At some point between calculating taxon differences and plotting, you filtered out taxa without filtering out filtering the differences in "diff_table" in the same way. I cant tell how that might have happened. I suggest check how the number of taxa and number of rows per comparison in "diff_table" change as the code progresses to find which step is causing the problem.

BrettLoman commented 3 years ago

Okay, I took out any filtering steps that I added (like the steps from the tutorial that filter out non-Bacteria and OTUs that had all zero counts) and edited my commands to resemble the above script (posted on Jan 7th)

as closely as possible. This did remove the need for adding the pseduo-counts and the error regarding too many rows for comparison, but now I have a new problem. Everything runs without error until the "heat_tree_matrix()", which returns the error "Error in apply(layout_matrix, MARGIN = 2, function(x) all(is.na(x))) : dim(X) must have a positive length"

I'm copying my code here (and happy to share the files if need-be) - any ideas on what's wrong now? I ultimately just want to show the taxonomic differences (in both positive and negative directions above a determined fold-difference) between my control and treatment group on (perhaps) a single heat tree. I'm just trying to do some hypothesis generation, so I'm also okay with using the un-corrected pvalues (for now). Am I going about this all wrong?

Thanks a TON!!!

loads readr to use "read_tsv"

library(readr)

imports OTU table

otu_data <- read_tsv("/Volumes/Secure KeyZ/Chemo_BA/16S/Chemo_For_HeatTrees/ChemoDose_Bioms/DC_Biom.txt")

imports Taxonomy

tax_data <- read_tsv("/Volumes/Secure KeyZ/Chemo_BA/16S/Chemo_For_HeatTrees/ChemoDose_Taxonomy/taxonomy_edited.txt")

loads diplyr to use "left_join"

library(dplyr)

merges Taxonomy and OTU table

otu_data <- left_join(otu_data, tax_data, by = c("OTU_ID" = "OTU_ID"))

loads necessary packages

library(taxa) library(metacoder)

imports Metadata

sample_data <- read_tsv("/Volumes/Secure KeyZ/Chemo_BA/16S/DC_sample_data.txt")

Parse data for plotting

x = parse_tax_data(otu_data, class_cols = "Taxonomy", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), classregex = "^([a-z]{0,1}){0,2}(.*)$")

Get per-taxon counts

x$data$tax_table <- calc_taxon_abund(x, data = "tax_data", cols = sample_data$Sample_ID)

Calculate difference between groups

x$data$diff_table <- calc_diff_abund_deseq2(x, data = "tax_table", cols = sample_data$Sample_ID, groups = sample_data$Treatment)

Remove taxa with only small differences

per_taxon_fold_changes <- obs(x, data = 'diff_table', value = 'log2FoldChange') per_taxon_p_values <- obs(x, data = 'diff_table', value = 'pvalue') per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) { if (all(per_taxon_p_values[[i]] > 0.05)) { return(0) } else { return(max(abs(per_taxon_fold_changes[[i]][per_taxon_p_values[[i]] <= 0.05]))) } })) x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))

Plot results (might take a few minutes)

heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = ifelse(is.na(padj) | padj > 0.05, 0, log2FoldChange), node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 fold change")

zachary-foster commented 3 years ago

Its hard to tell without the data to test. Can you send your code in an .R file and enough data to reproduce the problem so I can see what is going on? Thanks!

sghignone commented 3 years ago

Hi Zachary, sorry for extremely slow reply,

I saw you posted this to the google groups and in that question you asked about filtering out NAs as well. You can use the same technique, but again you have to consider if a NA in one comparison should cause the taxon to be removed from all comparisons.

This was resolved when I realized to work with a rarefied otu_table; reverting back to raw count solved everything. Cheers.

sghignone commented 3 years ago

per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) {
  if (all(per_taxon_p_values[[i]] > 0.05)) {
    return(0)
  } else {
    return(max(abs(per_taxon_fold_changes[[i]][per_taxon_p_values[[i]] <= 0.05])))
  }
}))
x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))

This would preserve only taxa that have a change (positive or negative) of greater than 2.5 associated with a p value less than .05 in at least one comparison. Does this do what you want?

In theory, YES. In practice, I'm getting an error when running that step:

Error in if (all(per_taxon_p_values[[i]] > 0.05)) { : 
  missing value where TRUE/FALSE needed
Called from: FUN(X[[i]], ...)

what's going on?

S.-

...EDIT... PS. OOK, I have NAs in per_taxon_p_values ;how can i deal with it?

zachary-foster commented 3 years ago

Many R functions have the na.rm option to ignore NAs, including all and max, so this would probably work:

per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) {
  if (all(per_taxon_p_values[[i]] > 0.05, na.rm = TRUE)) {
    return(0)
  } else {
    return(max(abs(per_taxon_fold_changes[[i]][per_taxon_p_values[[i]] <= 0.05]), na.rm = TRUE))
  }
}))
x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))

You could also do something like:

per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) {
  pvals <- per_taxon_p_values[[i]]
  pvals <- pvals[! is.na(pvals)]
  if (length(pvals) == 0 || all(pvals > 0.05)) {
    return(0)
  } else {
    return(max(abs(per_taxon_fold_changes[[i]][pvals <= 0.05])))
  }
}))
x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))

sghignone commented 3 years ago

Thanks Zachary for the follow-op. Both functions run without warnings or errors, but I get them downstream. 1) first function version

> per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) {
+   if (all(per_taxon_p_values[[i]] > 0.05, na.rm = TRUE)) {
+     return(0)
+   } else {
+     return(max(abs(per_taxon_fold_changes[[i]][per_taxon_p_values[[i]] <= 0.05]), na.rm = TRUE))
+   }
+ }))
> x <- filter_taxa(x, per_taxon_max_change > 2.5, 
+                  supertaxa = TRUE, 
+                  reassign_obs = c(diff_table = FALSE))
**Warning message:
There is no "taxon_id" column in the data set "3", so there are no taxon IDs.** 
> heat_tree_matrix(x,
+                  data = "diff_table",
+                  node_size = n_obs,
+                  node_label = taxon_names,
+                  node_color = ifelse(is.na(padj) | padj > 0.05, 0, log2FoldChange),
+                  node_color_range = diverging_palette(),
+                  node_color_trans = "linear",
+                  node_color_interval = c(-3, 3),
+                  edge_color_interval = c(-3, 3),
+                  node_size_axis_label = "Number of OTUs",
+                  node_color_axis_label = "Log2 fold change")
**Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
  VECTOR_ELT() can only be applied to a 'list', not a 'NULL'**

2) second function version

> per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) {
+   pvals <- per_taxon_p_values[[i]]
+   pvals <- pvals[! is.na(pvals)]
+   if (length(pvals) == 0 || all(pvals > 0.05)) {
+     return(0)
+   } else {
+     return(max(abs(per_taxon_fold_changes[[i]][pvals <= 0.05])))
+   }
+ }))
> x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))
Error: TRUE/FALSE vector (length = 1352) must be the same length as the number of taxa (760)

thanks for your attention s.-

zachary-foster commented 3 years ago

I just got the VECTOR_ELT() can only be applied to a 'list', not a 'NULL'** error too yesterday. I reinstalled metacoder from and it went away. Might be a issue with a dependency. For the 2nd error, did you by any chance try to filter the object x that was already filtered by the first chunk of code?

syaliny commented 6 months ago

Hi @zachary-foster , i am using metacoder to visualise groups that are differentially abundant and significant between nursery and all other forest compartments, I have 21 forest compartments in total. I would like to use the nursery as the baseline for comparison against all the other forest compartments such that the nursery is one group and the other group is all the forest compartments (21 in total). My end goal is to end up with a plot showing nursery against all other forest compartments comparisons...Would the code here work for these comparisons? I have been able to group them into these two categories by splitting the phyloseq object and generating a figure but i'd like to check if this would be the best way to do it or would you recommend doing individual forest compartments vs nursery. My code below for nursery vs all other forest compartments that i've grouped under two separate categories

Extract abundance matrix from the phyloseq object

OTU_nursery = as(otu_table(ps.nursery), "matrix")

Coerce to data.frame

OTU_nurserydf = as.data.frame(t(OTU_nursery))

OTU_nurserydf <- OTU_nurserydf %>% rownames_to_column(var="ASVs")

Pull out taxonomy info from phyloseq for metacoder plot

Nursery_taxa_root_only_decontam_noNTC_rar=data.frame(phyloseq::tax_table(ps.nursery))

write csv and edit names accordingly. For metacoder Had to format taxonomy table, each taxonomic level has to have the letter of the level in front of it, for example Kingdom column should have k at the front : kBacteria. Also need to add a column named Taxonomy with collated into from kingdom to genus separated by ;

write.csv(Nursery_taxa_root_only_decontam_noNTC_rar,file="/raid/projects/scratch/TRMP/RA_2/PM8_1/SG_Nursery_plant_out_trials/R_analysis/Phyloseq_16S_condensed/16S_nursery_Taxa_root_only_decontam_noNTC_rar_forMETACODER.csv")

Read the edited and formatted taxonomy file

Taxa_nursery=read.csv("/raid/projects/scratch/TRMP/RA_2/PM8_1/SG_Nursery_plant_out_trials/R_analysis/Phyloseq_16S_condensed/16S_nursery_Taxa_root_only_decontam_noNTC_rar_forMETACODER_edited.csv")

Repeat above for for all forest comp except nursery

All forest compartment without nursery

ps.all.except.nursery <- subset_samples(ps_aftercontam_removal_noNTC_root_rar_p, Forest.comp.sort != "0_1") ps.all.except.nursery<- prune_taxa(taxa_sums(ps.all.except.nursery) > 0, ps.all.except.nursery) # prune OTUs that are not present in at least one sample

Create OTU table

Extract abundance matrix from the phyloseq object

OTU_all_except_nursery = as(otu_table(ps.all.except.nursery), "matrix")

Coerce to data.frame

OTU_all_except_nurserydf = as.data.frame(t(OTU_all_except_nursery))

OTU_all_except_nurserydf <- OTU_all_except_nurserydf %>% rownames_to_column(var="ASVs")

write.csv(OTU_all_except_nurserydf,file="/raid/projects/scratch/TRMP/RA_2/PM8_1/SG_Nursery_plant_out_trials/R_analysis/Phyloseq_16S_condensed/16S_allexcept_nursery_ASVdf_root_only_decontam_noNTC_rar_forMETACODER.csv")

Repeat for taxonomy table

Pull out taxonomy info from phyloseq for metacoder plot

All_except_nursery_taxa_root_only_decontam_noNTC_rar=data.frame(phyloseq::tax_table(ps.all.except.nursery))

write csv and edit names accordingly. For metacoder Had to format taxonomy table, each taxonomic level has to have the letter of the level in front of it, for example Kingdom column should have k at the front : kBacteria. Also need to add a column named Taxonomy with collated into from kingdom to genus separated by ;

write.csv(All_except_nursery_taxa_root_only_decontam_noNTC_rar,file="/raid/projects/scratch/TRMP/RA_2/PM8_1/SG_Nursery_plant_out_trials/R_analysis/Phyloseq_16S_condensed/16S_allexcept_nursery_Taxa_root_only_decontam_noNTC_rar_forMETACODER.csv")

Read the edited and formatted taxonomy file

Taxa_allexcept_nursery=read.csv("/raid/projects/scratch/TRMP/RA_2/PM8_1/SG_Nursery_plant_out_trials/R_analysis/Phyloseq_16S_condensed/16S_allexcept_nursery_Taxa_root_only_decontam_noNTC_rar_forMETACODER_edited.csv")

Add categories for comparison

Taxa_allexcept_nursery$Category="All Forest compartments" Taxa_nursery$Category="Nursery"

Bind the two df

Taxa_nursery_allFC=rbind(Taxa_nursery,Taxa_allexcept_nursery)

Merge taxonomy and OTU table

Taxa_nursery_allFC$ASVs <- as.character(Taxa_nursery_allFC$ASVs) # Must be same type for join to work OTU2_root_only_decontam_noNTC_rar$ASVs <- as.character(OTU2_root_only_decontam_noNTC_rar$ASVs) # Must be same type for join to work OTU_root_df <- left_join(OTU2_root_only_decontam_noNTC_rar, Taxa_nursery_allFC, by = c("ASVs")) # identifies cols with shared IDs print(OTU_root_df)

tail(colnames(OTU_root_df), n = 10)

Read sample data

We want to only keep root in sample data, remove samples that are not in the physeq originally NIR16S_T_9_LCHV7, SIR16S_MC2_C1_LCHV7.

Root_sample_data=subset.data.frame(Root_sample_data,Type=="Root") Root_sample_data=subset.data.frame(Root_sample_data,!SampleID%in% c("NIR16S_T_9_LCHV7","SIR16S_MC2_C1_LCHV7"))

add category for sample data -nursery and all FC

Root_sample_data_ed=subset.data.frame(Root_sample_data,Forest.compartment.ID=="Tokoroa nursery") #nursery metadata Root_sample_data_ed2=subset.data.frame(Root_sample_data,Forest.compartment.ID!="Tokoroa nursery") #all other forest comp metadata

Add category to respective df and recombine

Root_sample_data_ed$Category="Nursery" Root_sample_data_ed2$Category="All Forest compartments"

Bind the two df

Root_sample_data_ed3=rbind(Root_sample_data_ed,Root_sample_data_ed2)

Parse data for plotting

x = parse_tax_data(OTU_root_df, class_cols = "Taxonomy", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), classregex = "^([a-z]{0,1}){0,2}(.*)$")

Get per taxon counts

x$data$tax_table <- calc_taxon_abund(x, data = "tax_data", cols = Root_sample_data_ed3$SampleID)

Calculate difference between groups

x$data$diff_table <- calc_diff_abund_deseq2(x, data = "tax_table", cols = Root_sample_data_ed3$SampleID, groups = Root_sample_data_ed3$Category)

Remove taxa with only small differences

per_taxon_fold_changes <- obs(x, data = 'diff_table', value = 'log2FoldChange') per_taxon_p_values <- obs(x, data = 'diff_table', value = 'pvalue') per_taxon_max_change <- unlist(lapply(seq_along(per_taxon_fold_changes), function(i) { if (all(per_taxon_p_values[[i]] > 0.05, na.rm = TRUE)) { return(0) } else { return(max(abs(per_taxon_fold_changes[[i]][per_taxon_p_values[[i]] <= 0.05]), na.rm = TRUE)) } })) x <- filter_taxa(x, per_taxon_max_change > 2.5, supertaxa = TRUE, reassign_obs = c(diff_table = FALSE))

Plot results

heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = ifelse(is.na(padj) | padj > 0.05, 0, log2FoldChange), node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 fold change")

I also noticed that the phyloseq before the split contains 7573 ASV but after splitting and recombining them its showing up as 7910 in the plot attached, not sure why ? and is there a way to adjust the wording in the plots so its more visible? Can you help please? Thanks !

16S_Metacoder_logFC_nursery_allFC.pdf

zachary-foster commented 6 months ago

Hello, I will try to take a look at this next week.

grunwaldlab / metacoder

filtering on Log2FoldChange (question) #297

loads readr to use "read_tsv"

imports OTU table

imports Taxonomy

loads diplyr to use "left_join"

merges Taxonomy and OTU table

loads necessary packages

imports Metadata

Parse data for plotting

Get per-taxon counts

Calculate difference between groups

Remove taxa with only small differences

Plot results (might take a few minutes)

Extract abundance matrix from the phyloseq object

Coerce to data.frame

Pull out taxonomy info from phyloseq for metacoder plot

Read the edited and formatted taxonomy file

Repeat above for for all forest comp except nursery

All forest compartment without nursery

Create OTU table

Extract abundance matrix from the phyloseq object

Coerce to data.frame

Repeat for taxonomy table

Pull out taxonomy info from phyloseq for metacoder plot

Read the edited and formatted taxonomy file

Add categories for comparison

Bind the two df

Merge taxonomy and OTU table

Read sample data

We want to only keep root in sample data, remove samples that are not in the physeq originally NIR16S_T_9_LCHV7, SIR16S_MC2_C1_LCHV7.

add category for sample data -nursery and all FC

Add category to respective df and recombine

Bind the two df

Parse data for plotting

Get per taxon counts

Calculate difference between groups

Remove taxa with only small differences

Plot results