New function: plot_density ?

antagomir commented 3 years ago

Can we add a function for quick visualization of adundance profiles as a density plot?

library(microbiome)
data(dietswap)
pseq.rel <- transform(dietswap, 'compositional')
library(ggplot2)
plot_density(pseq.rel, variable='Dialister') + scale_x_log10()

And after the basic functionality, it would be possible to add coloring schemes like in this example:

data(atlas1006)
pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r')
pseq <- transform(pseq, 'compositional')
hotplot(pseq, 'Dialister', tipping.point=1.1, log10=TRUE)

FelixErnst commented 3 years ago

I think this is a variation on the plotExpression/plotAbundance functions and can be easily added there. It just require a different plotter to be added (Are those ridge plots?).

The following example comes close I guess.

se <- microbiomeDataSets::atlas1006()
se <- relAbundanceCounts(se)
altExps(se) <- splitByRanks(se)
plotExpression(altExp(se,"Genus"),
               exprs_values = "relabundance",
               features="Dialister",
               colour_by = "Dialister") + 
    scale_y_log10() + 
    coord_flip()

The additions required would be:

new generic and method for plotting ridge plots
argument for plotting log-transformed which also takes care of colouring

antagomir commented 3 years ago

Yes, ridge plot. But can be also done just for a single taxonomic group.

FelixErnst commented 3 years ago

Yep, but that is something easily extended to, if more than one feature is plotted.

TuomasBorman commented 3 years ago

I wasn't completely sure what is the correct type of plot that is wanted. I thought that, when I have time I could take a closer look on this, because this issue is still open

So does this kind of plot look right? It utilize ggridges package

#install.packages("ggridges")
library(miaViz)
library(ggridges)

se <- microbiomeDataSets::atlas1006()

se <- transformSamples(se, method = "relabundance")

se_agg <- agglomerateByRank(se, "Genus")[c("Dialister", "Aeromonas", "Akkermansia", "Vibrio")]

#se_agg <- agglomerateByRank(se, "Genus")[c("Dialister")]

se_abund_df <- meltAssay(se_agg, assay_name = "relabundance")

se_abund_df$relabundance <- se_abund_df$relabundance + 
  min(se_abund_df$relabundance[!is.na(se_abund_df$relabundance) & (se_abund_df$relabundance > 0)])

ggplot(se_abund_df, aes(x = relabundance, group=FeatureID, y=FeatureID, fill = stat(x))) + 
  geom_density_ridges_gradient() + 
  geom_vline(aes(xintercept = 0.003), linetype = 2, size = 1) + 
  scale_x_log10()

antagomir commented 3 years ago

That's nice.

Visualization best practices advise against encoding the same information with more than one way. Here, the x axis and color both reflect abundance. It makes the figure fancy but does not add information. I would hence suggest removing the fill color, or it can be left optional for those enjoy eyecandy.
We recently used the visualization (jitterplot) below in our Nature Commications article, Supplementary file, Fig S1. It is just an alternative way to visualize the same information. Only difference is actually the layout (density plot vs jitterplot). These are handy tools for quick data exploration, especially in cohort studies. Hence I would suggest having this kind of function for exploratory purposes, with two or more alternative visualization options. However, not sure if @FelixErnst has comments in combining these two alternative visualizations under a single wrapper.

library(tidyverse)
library(microbiome)
library(magrittr)
library(reshape2)

data(atlas1006)
pseq <- atlas1006

ph0 <- microbiome::transform(pseq, "compositional")
pseq <- core(ph0, detection = 0.1/100, prevalence = 50/100)
dfm <- melt(abundances(pseq))
o <- dfm %>% group_by(Var1) %>%
  summarise(median = median(value),
            mean = mean(value)) %>%
  arrange(median) %>%
  mutate(Var1 = factor(Var1, unique(Var1)))
top <- rev(as.character(levels(o$Var1))) #[1:50]
dfm <- subset(dfm, Var1 %in% top)
dfm$Var1 <- factor(dfm$Var1, levels=rev(top))
brs <- c(10^(-rev(seq(0:3))), 0.5)

# Truncate values below the threshold
th <- 0.001/100
dfm$value[dfm$value < th] <- th

theme_set(theme_classic(10))
p <- ggplot(dfm, aes(x=Var1, y=value)) +
  geom_jitter(alpha = 0.15, width = 0.35) +       
  scale_y_continuous(# labels=scales::percent,
    labels = paste0(100 * brs, "%"),        
    trans  = "log10",
    breaks = brs) +
  coord_flip() +
  labs(x = "", y = "Relative abundance (%)")
print(p)

antagomir commented 3 years ago

Regarding ggridges package, I would suggest avoiding dependencies. Only add ggridges if there is remarkable added value, otherwise just use ggplot to implement.

TuomasBorman commented 3 years ago

Ok, here is another implementation, now it does not require ggridges, it uses facets. But yep, let's wait if Felix has comments on these

library(miaViz)

se <- microbiomeDataSets::atlas1006()

se <- transformSamples(se, method = "relabundance")

se_agg <- agglomerateByRank(se, "Genus")[c("Dialister", "Aeromonas", "Akkermansia", "Vibrio")]

#se_agg <- agglomerateByRank(se, "Genus")[c("Dialister")]

se_abund_df <- meltAssay(se_agg, assay_name = "relabundance")

se_abund_df$relabundance <- se_abund_df$relabundance +
  min(se_abund_df$relabundance[!is.na(se_abund_df$relabundance) & (se_abund_df$relabundance > 0)])

ggplot(se_abund_df, aes(x=relabundance)) +
   geom_density(fill = "grey") +
   facet_grid(FeatureID ~ ., switch = "y", scales="free") + # Creates facets, labels to left, taxa are sclaed individually
   theme_classic() + # changes theme
   theme(strip.background = element_blank(), strip.text.y.left = element_text(angle = 0), # Removes label grid, horizontal labels
         axis.ticks.y = element_blank(), axis.text.y=element_blank(), # Removes y-axis
         axis.title.y = element_blank(), axis.line.y = element_blank()) + # Removes y-axis
   geom_vline(aes(xintercept = 0.003), linetype = 2, size = 1) + 
   scale_x_log10()

FelixErnst commented 3 years ago

This Fig. S1 and the data plotted in the examples reminds me of plotHighestExprs from scater. The underlying data should be the same, but the aesthetics might be a bit different.

library(scater)
example_sce <- mockSCE()
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))
plotHighestExprs(example_sce[1:100,], colour_cells_by="detected")

The plotting is abit slow due to the large dimensions, so be patient, but I think it is quite realistic.

From the discussion, I would summarize it as follows:

plotting function showing the abundance of highest abundant bacteria
plotHighestExprs provides a blueprint for the function layout
possible additional feature
- switch between ticks on a scale and jitter points
- allow for summarization by plotting the density. (This should be a faster option, but would remove feature like plotting by colour)

Would you agree with this summary?

FelixErnst commented 3 years ago

microbiome / miaViz

New function: plot_density ? #15