Closed antagomir closed 3 years ago
I think this is a variation on the plotExpression
/plotAbundance
functions and can be easily added there. It just require a different plotter to be added (Are those ridge plots?).
The following example comes close I guess.
se <- microbiomeDataSets::atlas1006()
se <- relAbundanceCounts(se)
altExps(se) <- splitByRanks(se)
plotExpression(altExp(se,"Genus"),
exprs_values = "relabundance",
features="Dialister",
colour_by = "Dialister") +
scale_y_log10() +
coord_flip()
The additions required would be:
Yes, ridge plot. But can be also done just for a single taxonomic group.
Yep, but that is something easily extended to, if more than one feature is plotted.
I wasn't completely sure what is the correct type of plot that is wanted. I thought that, when I have time I could take a closer look on this, because this issue is still open
So does this kind of plot look right? It utilize ggridges
package
#install.packages("ggridges")
library(miaViz)
library(ggridges)
se <- microbiomeDataSets::atlas1006()
se <- transformSamples(se, method = "relabundance")
se_agg <- agglomerateByRank(se, "Genus")[c("Dialister", "Aeromonas", "Akkermansia", "Vibrio")]
#se_agg <- agglomerateByRank(se, "Genus")[c("Dialister")]
se_abund_df <- meltAssay(se_agg, assay_name = "relabundance")
se_abund_df$relabundance <- se_abund_df$relabundance +
min(se_abund_df$relabundance[!is.na(se_abund_df$relabundance) & (se_abund_df$relabundance > 0)])
ggplot(se_abund_df, aes(x = relabundance, group=FeatureID, y=FeatureID, fill = stat(x))) +
geom_density_ridges_gradient() +
geom_vline(aes(xintercept = 0.003), linetype = 2, size = 1) +
scale_x_log10()
That's nice.
Visualization best practices advise against encoding the same information with more than one way. Here, the x axis and color both reflect abundance. It makes the figure fancy but does not add information. I would hence suggest removing the fill color, or it can be left optional for those enjoy eyecandy.
We recently used the visualization (jitterplot) below in our Nature Commications article, Supplementary file, Fig S1. It is just an alternative way to visualize the same information. Only difference is actually the layout (density plot vs jitterplot). These are handy tools for quick data exploration, especially in cohort studies. Hence I would suggest having this kind of function for exploratory purposes, with two or more alternative visualization options. However, not sure if @FelixErnst has comments in combining these two alternative visualizations under a single wrapper.
library(tidyverse)
library(microbiome)
library(magrittr)
library(reshape2)
data(atlas1006)
pseq <- atlas1006
ph0 <- microbiome::transform(pseq, "compositional")
pseq <- core(ph0, detection = 0.1/100, prevalence = 50/100)
dfm <- melt(abundances(pseq))
o <- dfm %>% group_by(Var1) %>%
summarise(median = median(value),
mean = mean(value)) %>%
arrange(median) %>%
mutate(Var1 = factor(Var1, unique(Var1)))
top <- rev(as.character(levels(o$Var1))) #[1:50]
dfm <- subset(dfm, Var1 %in% top)
dfm$Var1 <- factor(dfm$Var1, levels=rev(top))
brs <- c(10^(-rev(seq(0:3))), 0.5)
# Truncate values below the threshold
th <- 0.001/100
dfm$value[dfm$value < th] <- th
theme_set(theme_classic(10))
p <- ggplot(dfm, aes(x=Var1, y=value)) +
geom_jitter(alpha = 0.15, width = 0.35) +
scale_y_continuous(# labels=scales::percent,
labels = paste0(100 * brs, "%"),
trans = "log10",
breaks = brs) +
coord_flip() +
labs(x = "", y = "Relative abundance (%)")
print(p)
Regarding ggridges package, I would suggest avoiding dependencies. Only add ggridges if there is remarkable added value, otherwise just use ggplot to implement.
Ok, here is another implementation, now it does not require ggridges
, it uses facets. But yep, let's wait if Felix has comments on these
library(miaViz)
se <- microbiomeDataSets::atlas1006()
se <- transformSamples(se, method = "relabundance")
se_agg <- agglomerateByRank(se, "Genus")[c("Dialister", "Aeromonas", "Akkermansia", "Vibrio")]
#se_agg <- agglomerateByRank(se, "Genus")[c("Dialister")]
se_abund_df <- meltAssay(se_agg, assay_name = "relabundance")
se_abund_df$relabundance <- se_abund_df$relabundance +
min(se_abund_df$relabundance[!is.na(se_abund_df$relabundance) & (se_abund_df$relabundance > 0)])
ggplot(se_abund_df, aes(x=relabundance)) +
geom_density(fill = "grey") +
facet_grid(FeatureID ~ ., switch = "y", scales="free") + # Creates facets, labels to left, taxa are sclaed individually
theme_classic() + # changes theme
theme(strip.background = element_blank(), strip.text.y.left = element_text(angle = 0), # Removes label grid, horizontal labels
axis.ticks.y = element_blank(), axis.text.y=element_blank(), # Removes y-axis
axis.title.y = element_blank(), axis.line.y = element_blank()) + # Removes y-axis
geom_vline(aes(xintercept = 0.003), linetype = 2, size = 1) +
scale_x_log10()
This Fig. S1 and the data plotted in the examples reminds me of plotHighestExprs
from scater. The underlying data should be the same, but the aesthetics might be a bit different.
library(scater)
example_sce <- mockSCE()
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))
plotHighestExprs(example_sce[1:100,], colour_cells_by="detected")
The plotting is abit slow due to the large dimensions, so be patient, but I think it is quite realistic.
From the discussion, I would summarize it as follows:
plotHighestExprs
provides a blueprint for the function layoutWould you agree with this summary?
See also #13
Yes, seems very feasible and I had the impression that this was already discussed earlier.
@tvborm you want to implement this?
Yep, I can implement this, I have to little bit think about how to do this
So this new function (plotDensity
?)
plotHighestExprs
to creates a dotplot that visualizes abundances of the most abundant taxaI think, that you cannot directly use the output of plotHighestExprs
, but use the code as blueprint and make necessary adjustments.
The function name is up to debate. plotDensity
is not specific enough, since it does not communicate what type of density should be plotted. plotHighestAbundance
or plotAbundanceDensity
are also options.
OK, thanks.
I think the naming depends on the functionality. If it plots only the highest abundances plotHighestAbundance would suit well.
But if user can specify which taxa are plotted, then e.g. plotAbundanceDensity
would be better (e.g., if the taxa with highest abundances is the default).
But the name can easily be changed
I would go with plotAbundanceDensity
for now.
Can we add a function for quick visualization of adundance profiles as a density plot?
The microbiome pkg equivalent.
And after the basic functionality, it would be possible to add coloring schemes like in this example: