microbiome / miaViz

Microbiome Analysis Plotting and Visualization
https://microbiome.github.io/miaViz
Artistic License 2.0
10 stars 12 forks source link

New function: plot_density ? #15

Closed antagomir closed 3 years ago

antagomir commented 3 years ago

Can we add a function for quick visualization of adundance profiles as a density plot?

The microbiome pkg equivalent.

library(microbiome)
data(dietswap)
pseq.rel <- transform(dietswap, 'compositional')
library(ggplot2)
plot_density(pseq.rel, variable='Dialister') + scale_x_log10()

And after the basic functionality, it would be possible to add coloring schemes like in this example:

data(atlas1006)
pseq <- subset_samples(atlas1006, DNA_extraction_method == 'r')
pseq <- transform(pseq, 'compositional')
hotplot(pseq, 'Dialister', tipping.point=1.1, log10=TRUE)
FelixErnst commented 3 years ago

I think this is a variation on the plotExpression/plotAbundance functions and can be easily added there. It just require a different plotter to be added (Are those ridge plots?).

The following example comes close I guess.

se <- microbiomeDataSets::atlas1006()
se <- relAbundanceCounts(se)
altExps(se) <- splitByRanks(se)
plotExpression(altExp(se,"Genus"),
               exprs_values = "relabundance",
               features="Dialister",
               colour_by = "Dialister") + 
    scale_y_log10() + 
    coord_flip()

The additions required would be:

antagomir commented 3 years ago

Yes, ridge plot. But can be also done just for a single taxonomic group.

FelixErnst commented 3 years ago

Yep, but that is something easily extended to, if more than one feature is plotted.

TuomasBorman commented 3 years ago

I wasn't completely sure what is the correct type of plot that is wanted. I thought that, when I have time I could take a closer look on this, because this issue is still open

So does this kind of plot look right? It utilize ggridges package

#install.packages("ggridges")
library(miaViz)
library(ggridges)

se <- microbiomeDataSets::atlas1006()

se <- transformSamples(se, method = "relabundance")

se_agg <- agglomerateByRank(se, "Genus")[c("Dialister", "Aeromonas", "Akkermansia", "Vibrio")]

#se_agg <- agglomerateByRank(se, "Genus")[c("Dialister")]

se_abund_df <- meltAssay(se_agg, assay_name = "relabundance")

se_abund_df$relabundance <- se_abund_df$relabundance + 
  min(se_abund_df$relabundance[!is.na(se_abund_df$relabundance) & (se_abund_df$relabundance > 0)])

ggplot(se_abund_df, aes(x = relabundance, group=FeatureID, y=FeatureID, fill = stat(x))) + 
  geom_density_ridges_gradient() + 
  geom_vline(aes(xintercept = 0.003), linetype = 2, size = 1) + 
  scale_x_log10()

image

antagomir commented 3 years ago

That's nice.

  1. Visualization best practices advise against encoding the same information with more than one way. Here, the x axis and color both reflect abundance. It makes the figure fancy but does not add information. I would hence suggest removing the fill color, or it can be left optional for those enjoy eyecandy.

  2. We recently used the visualization (jitterplot) below in our Nature Commications article, Supplementary file, Fig S1. It is just an alternative way to visualize the same information. Only difference is actually the layout (density plot vs jitterplot). These are handy tools for quick data exploration, especially in cohort studies. Hence I would suggest having this kind of function for exploratory purposes, with two or more alternative visualization options. However, not sure if @FelixErnst has comments in combining these two alternative visualizations under a single wrapper.

library(tidyverse)
library(microbiome)
library(magrittr)
library(reshape2)

data(atlas1006)
pseq <- atlas1006

ph0 <- microbiome::transform(pseq, "compositional")
pseq <- core(ph0, detection = 0.1/100, prevalence = 50/100)
dfm <- melt(abundances(pseq))
o <- dfm %>% group_by(Var1) %>%
  summarise(median = median(value),
            mean = mean(value)) %>%
  arrange(median) %>%
  mutate(Var1 = factor(Var1, unique(Var1)))
top <- rev(as.character(levels(o$Var1))) #[1:50]
dfm <- subset(dfm, Var1 %in% top)
dfm$Var1 <- factor(dfm$Var1, levels=rev(top))
brs <- c(10^(-rev(seq(0:3))), 0.5)

# Truncate values below the threshold
th <- 0.001/100
dfm$value[dfm$value < th] <- th

theme_set(theme_classic(10))
p <- ggplot(dfm, aes(x=Var1, y=value)) +
  geom_jitter(alpha = 0.15, width = 0.35) +       
  scale_y_continuous(# labels=scales::percent,
    labels = paste0(100 * brs, "%"),        
    trans  = "log10",
    breaks = brs) +
  coord_flip() +
  labs(x = "", y = "Relative abundance (%)")
print(p)
antagomir commented 3 years ago

Regarding ggridges package, I would suggest avoiding dependencies. Only add ggridges if there is remarkable added value, otherwise just use ggplot to implement.

TuomasBorman commented 3 years ago

Ok, here is another implementation, now it does not require ggridges, it uses facets. But yep, let's wait if Felix has comments on these

library(miaViz)

se <- microbiomeDataSets::atlas1006()

se <- transformSamples(se, method = "relabundance")

se_agg <- agglomerateByRank(se, "Genus")[c("Dialister", "Aeromonas", "Akkermansia", "Vibrio")]

#se_agg <- agglomerateByRank(se, "Genus")[c("Dialister")]

se_abund_df <- meltAssay(se_agg, assay_name = "relabundance")

se_abund_df$relabundance <- se_abund_df$relabundance +
  min(se_abund_df$relabundance[!is.na(se_abund_df$relabundance) & (se_abund_df$relabundance > 0)])

ggplot(se_abund_df, aes(x=relabundance)) +
   geom_density(fill = "grey") +
   facet_grid(FeatureID ~ ., switch = "y", scales="free") + # Creates facets, labels to left, taxa are sclaed individually
   theme_classic() + # changes theme
   theme(strip.background = element_blank(), strip.text.y.left = element_text(angle = 0), # Removes label grid, horizontal labels
         axis.ticks.y = element_blank(), axis.text.y=element_blank(), # Removes y-axis
         axis.title.y = element_blank(), axis.line.y = element_blank()) + # Removes y-axis
   geom_vline(aes(xintercept = 0.003), linetype = 2, size = 1) + 
   scale_x_log10() 

image

FelixErnst commented 3 years ago

This Fig. S1 and the data plotted in the examples reminds me of plotHighestExprs from scater. The underlying data should be the same, but the aesthetics might be a bit different.

library(scater)
example_sce <- mockSCE()
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))
plotHighestExprs(example_sce[1:100,], colour_cells_by="detected")

The plotting is abit slow due to the large dimensions, so be patient, but I think it is quite realistic.

From the discussion, I would summarize it as follows:

Would you agree with this summary?

FelixErnst commented 3 years ago

See also #13

antagomir commented 3 years ago

Yes, seems very feasible and I had the impression that this was already discussed earlier.

FelixErnst commented 3 years ago

@tvborm you want to implement this?

TuomasBorman commented 3 years ago

Yep, I can implement this, I have to little bit think about how to do this

So this new function (plotDensity?)

  1. Utilize plotHighestExprsto creates a dotplot that visualizes abundances of the most abundant taxa
  2. Additional feature(s):
    • Switch to a density plot that shows abundances of the most abundant taxa (Is the implementation with facets suitable for this purpose?)
FelixErnst commented 3 years ago

I think, that you cannot directly use the output of plotHighestExprs, but use the code as blueprint and make necessary adjustments.

The function name is up to debate. plotDensity is not specific enough, since it does not communicate what type of density should be plotted. plotHighestAbundance or plotAbundanceDensity are also options.

TuomasBorman commented 3 years ago

OK, thanks.

I think the naming depends on the functionality. If it plots only the highest abundances plotHighestAbundance would suit well.

But if user can specify which taxa are plotted, then e.g. plotAbundanceDensity would be better (e.g., if the taxa with highest abundances is the default).

But the name can easily be changed

antagomir commented 3 years ago

I would go with plotAbundanceDensity for now.