satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 910 forks source link

Something Wrong with FindAllMarkers Function, or should I adjust other parameters? #6513

Closed hjh-air-cond closed 2 years ago

hjh-air-cond commented 2 years ago

Hi!

Recently I'm working on a project with 170k cells through 20+ samples. But when I use FindAllMarkers functions, It returns 0 DEs for some celltype(or clusters) as following: 图片

I follow the code in the pbmc3k guide, so the code I use is: sc.data.markers <- FindAllMarkers(sc.data, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, test.use = 'wilcox') It also happens when I set min.pct and logfc.threshold to 0.

And in the last few months, I work on a few projects with maxium cell number around 90k-100kcells, which didn't returns such a result.

Could someone help me with that situation or give some advices ? Thanks!

longmanz commented 2 years ago

Hi @hjh-air-cond , Have you checked the UMAP of your dataset? Are you able to see the cells form correct clusters in your UMAP? If would be helpful if you could share the UMAP in this thread. If you find the cells do not form correct cluster, then there might be something wrong with the cell type annotations of your dataset. You may consider Seurat's Azimuth ref mapping to re-annotate your dataset at https://azimuth.hubmapconsortium.org

hjh-air-cond commented 2 years ago

Hi! @longmanz Thankls for you reply!

The UMAP grouped by clusters and celltypes were like this: 图片

The dataset is processed by normalize -> variable features -> scale -> PCA -> Harmony(on orig.ident) -> louvain and umap.

The original clusters num is big, and I annotated the clusters on a list of Marker genes based on CellMarker2.0.

longmanz commented 2 years ago

Hi @hjh-air-cond , Thank you for your information! The UMAP looks good, so it is little strange to observe such DE results. Have you set the default assay of your object to "RNA"?

hjh-air-cond commented 2 years ago

Hi! @longmanz Thanks for your help!

I set the default assay to 'RNA'( well actually I only have 1 assay in this dataset).

And last night I give it two more tries. I found that both results( on clusters and on celltypes) just become normal like that: On Clusters: Code: sc.data.markers <- FindAllMarkers(sc.data, only.pos = TRUE) 图片

On Celltypes: Code:

sc.data.markers <- FindAllMarkers(sc.data, only.pos = TRUE,
                                    min.pct = 0, 
                                    logfc.threshold = 0, 
                                    test.use = 'wilcox')

图片

I think the only difference is yesterday I run FindAllMarkers with future package on 20 cores to speed up. Code like this:

library(future)
plan(multisession, workers = 20)
system.time({
  sc.data.markers <- FindAllMarkers(sc.data, only.pos = TRUE,
                                    min.pct = 0, 
                                    logfc.threshold = 0, 
                                    test.use = 'wilcox') 
})

And last night I run these code on single core or less than 6 cores.

Could that leads to the Problem?

longmanz commented 2 years ago

Hi @hjh-air-cond, Glad you solved it! Using future to run multi-core could sometimes lead to unexpected results. I am not sure what is going on here, but it seems that it output DE results when the parallel jobs were not actually completed in your previous run. Please be mindful of using future in the future.