zhiyhu / CIDER

R package CIDER: Meta-Clustering for Single-Cell Data Integration and Evaluation
https://zhiyhu.github.io/CIDER/
MIT License
6 stars 5 forks source link

Help with error: `Cannot find 'Group' in this Seurat object` #4

Open sjspielman opened 1 year ago

sjspielman commented 1 year ago

I am filing this issue because I am encountering an error when trying to evaluate single-cell integration.

I attempted to follow the "quick start" in the README to evaluate an integrated dataset, but I am getting the error Error: Cannot find 'Group' in this Seurat object when running the estimateProb() function. I dug a little into the source code, and it looks like this Group column may represent ground truth cell types in the data: https://github.com/zhiyhu/CIDER/blob/68a08ae427a94fb783964e5709adeb904896e7f6/R/evaluation.R#L80-L82

But, my understanding is that CIDER evaluation is meant to be performed without a need for any ground truth. I also don't see any documentation regarding the specific need for this Group variable in the given Seurat object.

I would greatly appreciate any insight into what exactly is happening here, as well as any advice you might have for how to prepare a Seurat object for use.

Thanks for your help!!

# Code tested in R 4.1.2 and R 4.2.1

library(CIDER)  # version CIDER_0.99.0      
library(Seurat) # version Seurat_4.1.1 

seu.integrated <- readRDS("integrated_seurat.RDS")

# Look into the object - 
seu.integrated
#  An object of class Seurat 
#  5613 features across 2790 samples within 3 assays 
#  Active assay: integrated (1871 features, 1871 variable features)
#   2 other assays present: RNA, SCT
#   2 dimensional reductions calculated: cca_PCA, cca_UMAP

# Prepare object to match CIDER expected names
names(seu.integrated@reductions) <- c("pca", "umap") # pca should be lowercase instead of uppercase.
seu.integrated$Batch <- seu.integrated$batch # Batch should be uppercase instead of lowercase

# Look again into the object - 
seu.integrated
#  An object of class Seurat 
#  5613 features across 2790 samples within 3 assays 
#  Active assay: integrated (1871 features, 1871 variable features)
#   2 other assays present: RNA, SCT
#   2 dimensional reductions calculated: pca, umap

# Evaluate integration, following the example here:
#   https://github.com/zhiyhu/CIDER/#cider-as-an-evaluation-metric---quick-start
seu.integrated <- hdbscan.seurat(seu.integrated)
ider <- getIDEr(seu.integrated, use.parallel = FALSE, verbose = FALSE)
seu.integrated <- estimateProb(seu.integrated, ider) # Results in error:
# Error: Cannot find 'Group' in this Seurat object
zhiyhu commented 1 year ago

Hi Stephanie, yes it looks like a bug to me. Really appreciate your efforts in spotting it!

I have updated the CRAN version to 0.99.1 to fix this bug.

sjspielman commented 1 year ago

Hi @zhiyhu , thanks for having a look at this! I went ahead an updated to the 0.99.1 version from CRAN, but now I get a different error (with the same code I sent in this issue) -

Repeated column names found in count matrix
Error in eval(ej, envir = levelsenv) : object 'tmpbg' not found

I'm not sure what this error is implying, if you have any insight? Thank you!

zhiyhu commented 1 year ago

This is indeed a new error. Would you mind share a set of your data for testing?

sjspielman commented 1 year ago

Sure, here is the data I am using. It's shared via a Google Drive link because it's too large to upload here.

https://drive.google.com/file/d/1nFPwnf2S-wQlmCyNNpiDl5L8IxGMLWO0/view?usp=sharing