statdivlab / corncob

Count Regression for Correlated Observations with the Beta-binomial
102 stars 22 forks source link

Feature request sent via email: prediction intervals #94

Open bryandmartin opened 3 years ago

bryandmartin commented 3 years ago

Be able to calculate intervals for all ASVs and without having to use plot

ianartmor commented 3 years ago

Hi Bryan,

If I'm understanding the request correctly, I have code for a pretty simple function that does this in case anyone needs a clunky interim solution. It uses purrr (and forcats for plotting) but I assume most folks have those packages.

It does use bbdml:::plot, but to me that function might as well be called calculatePredictionIntervals (at least with the data_only=TRUE option). It also relies on the user using the full_output=TRUE option in differentialTest.

Code for the function:

differentialTestPredictions <- function(diftest_res, B = 1000, taxa) {

  # Below line is probably not ideal! Would be safer if differentialTest outputs were named lists
  names(diftest_res$full_output) <- taxa_names(diftest_res$data)

  # default is all taxa
  if (missing(taxa)) {
    taxa <- taxa_names(diftest_res$data)
  } else {
    taxa <- taxa
  }

  purrr::map(
    diftest_res$full_output[taxa],
    ~ corncob:::plot.bbdml(.,
                           B = B, data_only = TRUE)
  )
}

How I use it in my workflow:

library(corncob)
library(phyloseq)
library(purrr)
library(dplyr)
library(forcats)
data("soil_phylo")

# differential test
soil_diftest_res <- differentialTest(
  data = soil_phylo %>% tax_glom("Genus"), formula = ~DayAmdmt, phi.formula = ~1,
  formula_null = ~1, phi.formula_null = ~1, test = "LRT", full_output = TRUE
)

# differential test prediction function for significant taxa (low bootstrap # for demonstration)
prediction_list <- differentialTestPredictions(
  diftest_res = soil_diftest_res,
  B = 5,
  taxa = soil_diftest_res$significant_taxa
)

# get sample data
samdat <- sample_data(soil_phylo) %>%
  data.frame() %>%
  rownames_to_column("samples")

# combine with prediction intervals
prediction_list_with_samdat <- prediction_list %>% map(~ left_join(., samdat, by = "samples"))

# reorder samples and plot
plotlist <- prediction_list_with_samdat %>%
  map2(.x = ., .y=names(.),
    ~ mutate(.data = ., samples = fct_reorder(samples, DayAmdmt)) %>%
      ggplot(data = .) +
      geom_pointrange(aes(x = samples, ymin = ymin, ymax = ymax, y = RA, color = DayAmdmt)) +
      theme_bw() +
      theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())+ggtitle(.y)
  )

plotlist[1] 

Best, Ian