Closed antoine4ucsd closed 3 years ago
Dear @antoine4ucsd
Just to be clear .. you are running plotClusterExprs()
, right?
If you look at the code ..
> plotClusterExprs
function (x, k = "meta20", features = "type")
{
.check_sce(x, TRUE)
k <- .check_k(x, k)
x$cluster_id <- cluster_ids(x, k)
features <- .get_features(x, features)
ms <- t(.agg(x[features, ], "cluster_id", "median"))
d <- dist(ms, method = "euclidean")
o <- hclust(d, method = "average")$order
cd <- colData(x)
es <- assay(x[features, ], "exprs")
df <- data.frame(t(es), cd, check.names = FALSE)
df <- melt(df, id.vars = names(cd), variable.name = "antigen",
value.name = "expression")
df$avg <- "no"
avg <- df
avg$cluster_id <- "avg"
avg$avg <- "yes"
df <- rbind(df, avg)
fq <- tabulate(x$cluster_id)/ncol(x)
fq <- round(fq * 100, 2)
names(fq) <- levels(x$cluster_id)
df$cluster_id <- factor(df$cluster_id, levels = rev(c("avg",
levels(x$cluster_id)[o])), labels = rev(c("average",
paste0(names(fq), " (", fq, "%)")[o])))
ggplot(df, aes_string(x = "expression", y = "cluster_id",
col = "avg", fill = "avg")) + facet_wrap(~antigen, scales = "free_x",
nrow = 2) + geom_density_ridges(alpha = 0.2) + theme_ridges() +
theme(legend.position = "none", strip.background = element_blank(),
strip.text = element_text(face = "bold"))
}
<bytecode: 0x7fb66d6f4ba8>
<environment: namespace:CATALYST>
.. the calculation is actually done by ggridges::geom_density_ridges
and so is external to CATALYST
.
But, I suppose the calculation that geom_density_ridges()
is doing could be made less memory intensive by manually looping through each marker and cluster .. calculating all the densities with the density()
function or something like that.
As you mention, down-sampling should also work, especially well for the bigger clusters .. so you might want to down-sample in a cluster-wise fashion ..
Best, Mark
thank you for your detailed answer. really helpful. I will look into my SingleCellExperiment object and try downsampling.
Best,
Hello I am using CyTOF workflow to process a large dataset of flow data. Everything works really fine apart from the ClusterExpression density plot: Rstudio is running out of memory. Is there a way we can avoid that (increasing the mem limit? or downsampling ?) I was able to run it once and got the attached results. Is there a way to extract the marker 'density' data for all clusters from the sce input?
thank you,