grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
134 stars 28 forks source link

The taxon list table for the differential heat tree #317

Open biosciences opened 2 years ago

biosciences commented 2 years ago

Regarding the differential heat tree, how I can print the taxon list table for the differential heat tree after Wilcoxon rank Sum test? The taxon list should include the more abundant taxons for both comparative groups.

zachary-foster commented 2 years ago

That information is in the table returned by compare_groups or calc_diff_abund_deseq2. You can subset it to just the significant results and add other information to it as needed. For example:

library(metacoder)
library(dplyr)

# Parse data for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Convert counts to proportions
x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id)
#> Calculating proportions from counts for 50 columns for 1000 observations.

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id)
#> Summing per-taxon counts from 50 columns for 174 taxa

# Calculate difference between groups
x$data$diff_table <- compare_groups(x, data = "tax_table",
                                    cols = hmp_samples$sample_id,
                                    groups = hmp_samples$body_site)

# Subset the table for p-value
signif_taxon_diff <- filter(x$data$diff_table, p.adjust(wilcox_p_value) < 0.05)

# Add taxon name
signif_taxon_diff$taxon_name <- taxon_names(x)[signif_taxon_diff$taxon_id]
signif_taxon_diff
#> # A tibble: 45 x 8
#>    taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff
#>    <chr>    <chr>       <chr>                   <dbl>       <dbl>     <dbl>
#>  1 ad       Nose        Saliva                  -5.42      -0.267    -0.262
#>  2 ae       Nose        Saliva                   5.12       0.597     0.581
#>  3 al       Nose        Saliva                  -5.68      -0.248    -0.243
#>  4 am       Nose        Saliva                   5.12       0.597     0.581
#>  5 ap       Nose        Saliva                  -5.82      -0.173    -0.196
#>  6 ay       Nose        Saliva                  -5.68      -0.248    -0.243
#>  7 az       Nose        Saliva                   5.13       0.597     0.583
#>  8 bc       Nose        Saliva                  -5.82      -0.173    -0.196
#>  9 bf       Nose        Saliva                  -3.26      -0.154    -0.210
#> 10 bx       Nose        Saliva                  -6.86      -0.170    -0.186
#> # … with 35 more rows, and 2 more variables: wilcox_p_value <dbl>,
#> #   taxon_name <chr>

Created on 2021-08-23 by the reprex package (v2.0.0)

biosciences commented 2 years ago

Thank you very much for your help! Your script is very useful.

zachary-foster commented 2 years ago

No problem, glad it helped!

DanielSoutoV commented 2 years ago

Hola Zachary,

I am trying to do something similar here - I am actually just trying to see which taxa overlap in both treatments so I figured I'd only use the median_diff =0 instead of wilcox_p_value column (? or is this complete nonsense?!). I tried running this code in my own data, but I had some errors.

I thought of trying out the one here, but I still get an error that the wilcox_p_value column (?) is not found:

signif_taxon_diff <- filter(x$data$diff_table, p.adjust(wilcox_p_value) < 0.05) Error in p.adjust(wilcox_p_value) : object 'wilcox_p_value' not found

I tried changing the code thinking its a matter of specifying which column, but I still get an error (although a different one).

signif_taxon_diff <- filter(x$data$diff_table, p.adjust(x$data$diff_table$wilcox_p_value) < 0.05) Error in filter(x$data$diff_table, p.adjust(x$data$diff_table$wilcox_p_value) < : missing values in 'filter'

When I try this with my own data, using the code below, I do get some 'signif_taxon_diff' except its now a time series table populated with NA!

mediandiff<- filter(obj$data$diff_table, obj$data$diff_table$median_diff == 0) mediandiff Time Series: Start = 1 End = 283 Frequency = 1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] 1 NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA 3 NA NA NA NA NA NA NA 4 NA NA NA NA NA NA NA 5 NA NA NA NA NA NA NA 6 NA NA NA NA NA NA NA 7 NA NA NA NA NA NA NA 8 NA NA NA NA NA NA NA 9 NA NA NA NA NA NA NA 10 NA NA NA NA NA NA NA ....... 136 NA NA NA NA NA NA NA 137 NA NA NA NA NA NA NA 138 NA NA NA NA NA NA NA 139 NA NA NA NA NA NA NA 140 NA NA NA NA NA NA NA 141 NA NA NA NA NA NA NA 142 36521 268 268 NaN 3.5 -1.6754 58.81118 [ reached getOption("max.print") -- omitted 141 rows ]

Sorry if this is all a bit convoluted for what I attempt to do - maybe there is even an easier way to just show which taxa are present in both treatments, but any help will be very much appreciated!

zachary-foster commented 2 years ago

Hola @DanielSoutoV,

There is probably an easier way to do that, if I understand your goal correctly. Is something like this what you want?

library(metacoder)
#> This is metacoder verison 0.3.5 (stable)
hmp_samples <- dplyr::ungroup(hmp_samples) # Needed until next package update

# Parse data from abundance matrix
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Calculate the taxon abundance for groups of columns (e.g. treatments)
x$data$taxon_abund <- calc_taxon_abund(x, "tax_data", cols = hmp_samples$sample_id, groups = hmp_samples$body_site)
#> Summing per-taxon counts from 50 columns in 5 groups for 174 taxa

# Check which taxa overlap
taxon_names(x)[x$data$taxon_abund$Nose > 0 & x$data$taxon_abund$Saliva > 0]
#>                      ab                      ac                      ad 
#>                  "Root"        "Proteobacteria"         "Bacteroidetes" 
#>                      ae                      af                      ag 
#>        "Actinobacteria"            "Firmicutes"          "Fusobacteria" 
#>                      aj                      ak                      al 
#>   "Gammaproteobacteria"         "Flavobacteria"           "Bacteroidia" 
#>                      am                      an                      ao 
#>        "Actinobacteria"    "Betaproteobacteria"               "Bacilli" 
#>                      ap                      aq                      as 
#>            "Clostridia"          "Fusobacteria" "Epsilonproteobacteria" 
#>                      aw                      ax                      ay 
#>        "Pasteurellales"      "Flavobacteriales"         "Bacteroidales" 
#>                      az                      ba                      bc 
#>       "Actinomycetales"          "Neisseriales"         "Clostridiales" 
#>                      bd                      be                      bf 
#>       "Burkholderiales"       "Fusobacteriales"       "Lactobacillales" 
#>                      bg                      bi                      bk 
#>            "Gemellales"     "Enterobacteriales"      "Coriobacteriales" 
#>                      bl                      bq                      br 
#>     "Campylobacterales"       "Pasteurellaceae"     "Flavobacteriaceae" 
#>                      bs                      bt                      bu 
#>    "Porphyromonadaceae"  "Propionibacteriaceae"         "Neisseriaceae" 
#>                      bw                      bx                      bz 
#>        "Bacteroidaceae"       "Veillonellaceae"      "Burkholderiaceae" 
#>                      ca                      cb                      cd 
#>    "Corynebacteriaceae"      "Actinomycetaceae"      "Fusobacteriaceae" 
#>                      ce                      cf                      cg 
#>      "Streptococcaceae"       "Lachnospiraceae"        "Prevotellaceae" 
#>                      ch                      ck                      cm 
#>           "Gemellaceae"    "Enterobacteriaceae"     "Coriobacteriaceae" 
#>                      cn                      cp                      cr 
#>       "Ruminococcaceae"     "Carnobacteriaceae" "Peptostreptococcaceae" 
#>                      cx                      dm                      dn 
#>        "Micrococcaceae"           "Haemophilus"        "Capnocytophaga" 
#>                      do                      dp                      dq 
#>         "Porphyromonas"     "Propionibacterium"             "Neisseria" 
#>                      ds                      dt                      dv 
#>           "Bacteroides"           "Selenomonas"             "Lautropia" 
#>                      dw                      dx                      dz 
#>       "Corynebacterium"           "Actinomyces"           "Veillonella" 
#>                      ea                      eb                      ed 
#>          "Leptotrichia"         "Streptococcus"            "Prevotella" 
#>                      ee                      ef                      eg 
#>               "Gemella"          "Oribacterium"        "Shuttleworthia" 
#>                      em                      es                      et 
#>         "Fusobacterium"          "Ruminococcus"        "Granulicatella" 
#>                      ey                      ez                      fb 
#>             "Dialister"    "Peptostreptococcus"       "Parabacteroides" 
#>                      fg                      fi                      fk 
#>             "Roseburia"           "Mitsuokella"             "Catonella" 
#>                      fm                      gi                      gl 
#>                "Rothia"            "Tannerella"           "Providencia"

Created on 2022-02-02 by the reprex package (v2.0.1)

DanielSoutoV commented 2 years ago

Excellent Zachary thanks a lot. Yes, definitely much easier this way. Best,