kharchenkolab / cacoa

Single-cell Case Control Analysis
45 stars 7 forks source link

Troubles with estimateExpressionShiftMagnitudes() and estimateDiffCellDensity() functions #21

Open anastasiiaNG opened 2 years ago

anastasiiaNG commented 2 years ago

Dear Cacoa team,

Thank you for creating such a perspective tool for analysis single cell RNAseq data.

I currently tried to apply it to our data but faced several issues:

  1. cao$estimateExpressionShiftMagnitudes() fails with the following message:

    Filtering data... 
    Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : 
    'NA' indices are not (yet?) supported for sparse Matrices 

    Could you please specify what exactly is wrong with the input data?

  2. cao$estimateDiffCellDensity(type='permutation', verbose=T) function also fails with the following message:

    
    > cao$estimateCellDensity()
    > cao$estimateDiffCellDensity(type='permutation', verbose=T)

Error in smoothSignalOnGraph(., filter = g.filt, graph = graph) : The provided graph is not connected. It has to be connected to estimate l.max.

It starts working when argument `smooth` is specified to `FALSE`. 

This message looks confusing because 

gg <- cao$data.object@graphs$integrated_nn %>% graph_from_adjacency_matrix() igraph::is_connected(gg) [1] TRUE

I tried to fix it by changing the k.param and prune.SNN arguments to make the graph of the original Seurat object more connected in the following way...

whole.integrated <- FindNeighbors(whole.integrated,
                                  k.param = 100, # instead of default k = 20
                                  prune.SNN = 0, # instead of default prune.SNN = 1/15
                                  )

...but it didn't help.

Respectively, function plotDiffCellDensity() works only with permutation parameter, otherwise it returns the same error.

Could you please clarify how these issues may be fixed? Thank you!

cao object can be downloaded here.

evanbiederstedt commented 2 years ago

Hi @anastasiiaNG

Thanks for the issue. Cacoa is still in development of course. These issues provide useful user feedback.

I currently tried to apply it to our data but faced several issues: cao object can be downloaded here.

Could you share how this data was created? Please share the code used. If we are able to reproduce this, we'll be able to provide better feedback.

Thanks

anastasiiaNG commented 2 years ago

Hi, @evanbiederstedt,

This is how the cao object was created from the seurat object whole.integrated:

library(cacoa)
library(Seurat)
library(ggplot2)
library(tidyverse)

whole.integrated <- readRDS("whole.integrated.rds")

whole.integrated <- FindNeighbors(whole.integrated,
                                  k.param = 100, # instead of default k = 20
                                  prune.SNN = 0, # instead of default prune.SNN = 1/15
                                  )

whole.integrated <- FindClusters(whole.integrated, resolution = 3)

sample.groups <- factor(c("condition2", "condition2", "condition1", "condition1", "condition1", "condition2"))
names(sample.groups) <- c("R_1", "R_2", "R_3", "R_4", "R_5", "R_8")

cell.groups <- as.character(whole.integrated@meta.data$integrated_snn_res.3)
names(cell.groups) <- rownames(whole.integrated@meta.data)

sample.per.cell <- whole.integrated@meta.data$sample
names(sample.per.cell) <- rownames(whole.integrated@meta.data)

ref.level <- "condition1"
target.level <- "condition2"

cao <- Cacoa$new(whole.integrated, 
                   sample.groups = sample.groups, 
                   cell.groups = cell.groups, 
                   sample.per.cell=sample.per.cell, 
                   ref.level=ref.level, 
                   target.level=target.level, 
                   graph.name="integrated_nn",
                   embedding = whole.integrated@reductions$umap@cell.embeddings)

bs <- if (length(levels(cao$cell.groups)) > 20) 10 else 16
cao$plot.theme <- theme_bw(base_size = bs) + 
    theme(plot.title=element_text(hjust = 0.5), 
          legend.background=element_rect(fill=alpha("white", 0.2)),
          legend.text=element_text(size=12), 
          legend.margin=margin(6, 6, 4, 1, 'pt'),
          plot.margin=margin())

cao$cell.groups.palette <- levels(cao$cell.groups) %>% 
    {setNames(sample(brewerPalette("Paired")(length(.))), .)}

cao$sample.groups.palette <- c("#d73027", "#4575b4") %>% 
    setNames(c(cao$target.level, cao$ref.level))
ianphelps commented 2 years ago

Hi,

Thank you very much for providing such great insight into case-control analysis in scRNA-seq and for creating an incredibly useful tool.

@evanbiederstedt I want to mention that I'm running into the same errors described by @anastasiiaNG. Also using a seurat object. Any help you can provide is greatly appreciated. I'm happy to provide further details as needed. I generated the coa object similarly to anastasiiaNG.

Thanks!

VPetukhov commented 2 years ago

Hi @anastasiiaNG and @ianphelps , I recently released v0.3.0 together with an updated walkthrough. It has a lot of changes and bug fixes, could you please try it? I tested it on the object you provided, and the function calls work fine. There was just something wrong with the ggplot theme in this object, so I added cao$plot.theme <- theme_bw().

anastasiiaNG commented 2 years ago

Hi @anastasiiaNG and @ianphelps , I recently released v0.3.0 together with an updated walkthrough. It has a lot of changes and bug fixes, could you please try it? I tested it on the object you provided, and the function calls work fine. There was just something wrong with the ggplot theme in this object, so I added cao$plot.theme <- theme_bw().

Hi, @VPetukhov

Thank you for the updates.

Does the updated walkthrough work correct in your hands on my data?

I've tested it on the cao object that I obtain just like in the https://github.com/kharchenkolab/cacoa/issues/21#issuecomment-1075157459 (with cao$plot.theme <- theme_bw() as you recommend).

The updated walkthrough works correct until this step:

sample_meta <- whole.integrated@meta.data

pvals <- cao$estimateMetadataSeparation(sample.meta=sample_meta)$padjust
# Error in x[...] <- m : replacement has length zero

cao$plotSampleDistances(
  space="expression.shifts",
  sample.colors=sample_meta$condition, legend.position=c(0, 1), font.size=2
)
# Error: Must request at least one colour from a hue palette.

Functions cao$plotVolcano, cao$plotOntologyHeatmapCollapsed, cao$plotOntologyHeatmap also fail to plot the results even though cao$estimateDEPerCellType, cao$estimateDEPerCellType, cao$estimateOntology work without errors:

> cao$plotVolcano(xlim=c(-3, 3), ylim=c(0, 3.5), lf.cutoff=1)
Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto),  : 
  Viewport has zero dimension(s)
In addition: There were 15 warnings (use warnings() to see them)

> warnings()
Warning messages:
1: Removed 3 rows containing missing values (geom_point).
2: Removed 3 rows containing missing values (geom_point).
3: Removed 15 rows containing missing values (geom_point).
4: Removed 21 rows containing missing values (geom_point).
5: Removed 40 rows containing missing values (geom_point).
6: Removed 26 rows containing missing values (geom_point).
7: Removed 135 rows containing missing values (geom_point).
8: Removed 4 rows containing missing values (geom_point).
9: Removed 11 rows containing missing values (geom_point).
10: Removed 5 rows containing missing values (geom_point).
11: Removed 3 rows containing missing values (geom_point).
12: Removed 8 rows containing missing values (geom_point).
13: Removed 1 rows containing missing values (geom_point).
14: Removed 12 rows containing missing values (geom_point).
15: Removed 13 rows containing missing values (geom_point).

> cao$plotOntologyHeatmapCollapsed(
+   name="GSEA", genes="up", n=50, clust.method="ward.D", size.range=c(1, 4)
+ )
Error: Can't rename columns that don't exist.
x Column `core_enrichment` doesn't exist.

> cao$plotOntologyHeatmap(
+   name="GSEA", genes="up", description.regex="extracellular|matrix"
+ )
Error: Can't rename columns that don't exist.
x Column `core_enrichment` doesn't exist.

And starting from "Cluster-free. Compositional changes" none of the functions works on my data:

> cao$estimateCellDensity(method='graph')
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Please supply a symmetric matrix if you want to create a weighted graph with mode=UNDIRECTED.

> cao$estimateClusterFreeExpressionShifts(
+   gene.selection="expression", min.n.between=3, min.n.within=3
+ )
Error in graph.adjacency.sparse(adjmatrix, mode = mode, weighted = weighted,  : 
  Please supply a symmetric matrix if you want to create a weighted graph with mode=UNDIRECTED.

> cao$estimateClusterFreeDE(
+     n.top.genes=1000, min.expr.frac=0.01, adjust.pvalues=TRUE, smooth=TRUE, 
+     verbose=TRUE
+ )
Estimating cluster-free Z-scores for 1000 most expressed genes
Error in graph.adjacency.sparse(adjmatrix, mode = mode, weighted = weighted,  : 
  Please supply a symmetric matrix if you want to create a weighted graph with mode=UNDIRECTED.
evanbiederstedt commented 2 years ago

Hi @anastasiiaNG

I'm following your instructions to recreate the data above, using the data downloaded from your link.

Could you please uninstall cacoa and re-install it? i.e. remove.packages('cacoa') and then devtools::install_github("kharchenkolab/cacoa")

RE: 1.

cao$estimateExpressionShiftMagnitudes() fails with the following message:

This works for me now:

> cao$estimateExpressionShiftMagnitudes()
Filtering data... 
done!

Calculating pairwise distances using dist='cor'...

  |=======================================================| 100%, Elapsed 00:10
Done!

RE: 2

This works for me too. Please check again.

> cao$estimateCellDensity()
> cao$estimateDiffCellDensity(type='permutation', verbose=T)
  |=======================================================| 100%, Elapsed 00:00
  |=======================================================| 100%, Elapsed 00:02
>

I tried using your data and running through http://pklab.med.harvard.edu/viktor/cacoa/walkthrough_short.html

It's true there are some errors, which are surely associated with some extra functionality we need to incorporate for Seurat objects. I'll try to help out with this when I get some time.

In the meantime, please try using a Conos object instead of a Seurat object if you could: https://github.com/kharchenkolab/conos

Thanks, Evan

VPetukhov commented 2 years ago

Thanks, @anastasiiaNG ! Right, I also checked other functions and indeed there are errors in them. It's a bit of a pain to debug, as the part with core_enrichment comes from the DOSE, which returns a data frame without this column on your data. I have no idea why, so will need to dig into their specific. I'll fix it in the next couple of weeks.

VPetukhov commented 2 years ago

@anastasiiaNG , I released version 0.4.0, where all this should be fixed. Could you please check it?

anastasiiaNG commented 2 years ago

@anastasiiaNG , I released version 0.4.0, where all this should be fixed. Could you please check it?

@VPetukhov , great, thanks a lot, almost all bugs are fixed now.

The only thing that still produces an error is here (probably due to the absence of factors that significantly affect sample differences):

sample_meta <- whole.integrated@meta.data[, c(4,7)]
head(as.data.frame(sample_meta))

pvals <- cao$estimateMetadataSeparation(sample.meta=sample_meta)$padjust
# Error in x[...] <- m : replacement has length zero

I also noticed that in estimating gene programs you put by default z.adj=TRUE and smooth=FALSE, however warning says that Gene programs without smoothing often produce intractable results, especially with z.adj=FALSE. Are z.adj=TRUE and smooth=TRUE the best option here? Or how to decide when these parameters should be set to TRUE?

Angel-Wei commented 1 year ago

Hi, I ran into an error when I called cao$plotOntologyHeatmapCollapsed function. I got an error

Error in `chr_as_locations()`:
! Can't rename columns that don't exist.
✖ Column `qvalues` doesn't exist.

Before I come to this step, everything ran well and functions cao$estimateDEPerCellType and cao$estimateOntology (I used'GSEA')also worked. Here's some output info I got from rlang::last_error():

> pdf(paste0(rep.name, '/', rep.name,"_plotOntologyHeatmapCollapsed_up_in_MUT.pdf"), height=15, width=10)
> print(cao$plotOntologyHeatmapCollapsed(
+   name="GSEA", genes="up", n=50, clust.method="ward.D", size.range=c(1, 4)) + ggtitle("Up in MUT"))
Error in (function (cond)  : 
  error in evaluating the argument 'x' in selecting a method for function 'print': Can't rename columns that don't exist.
✖ Column `qvalues` doesn't exist.
> rlang::last_error()
<error/vctrs_error_subscript_oob>
Error in `chr_as_locations()`:
! Can't rename columns that don't exist.
✖ Column `qvalues` doesn't exist.
---
Backtrace:
  1. cao$plotOntologyHeatmapCollapsed(...)
  6. dplyr:::rename.data.frame(., geneID = core_enrichment, qvalue = qvalues)
  7. tidyselect::eval_rename(expr(c(...)), .data)
  8. tidyselect:::rename_impl(...)
  9. tidyselect:::eval_select_impl(...)
 18. tidyselect:::vars_select_eval(...)
 19. tidyselect:::walk_data_tree(expr, data_mask, context_mask, error_call)
 20. tidyselect:::eval_c(expr, data_mask, context_mask)
 21. tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
 22. tidyselect:::walk_data_tree(new, data_mask, context_mask)
 23. tidyselect:::as_indices_sel_impl(...)
 24. tidyselect:::as_indices_impl(x, vars, call = call, strict = strict)
 25. tidyselect:::chr_as_locations(x, vars, call = call)
Run `rlang::last_trace()` to see the full context.

I wonder is there anything I can do to fix this problem to generate GSEA results? Thank you so much for the help in advance!