satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

When using SCTransform v2, clusters conflict with canonical markers. #5714

Closed WuRAFY closed 2 years ago

WuRAFY commented 2 years ago

Hi seurat team, firstly thanks for the splendid tool.

But recently I ran into a confusing problem when using sctransfrom v2. This is the cluster result. 7 (M0T$XWJX~GW LE8LWG_V The main problem is between cluster 0 (mostly epithelial cells) and cluster 1(mostly B cells). When dying cells with some canonical markers, it seems some epithelial cells are assigned to cluster 1 (B cell cluster). The problematic part is circled. A tool "cellassign" which assign type to individual cell rather cluster also indicate the same problem. IQ}M65KI21@S%RYB63$K CP And higher cluster resolution does not help. This is the result of increasing the resolution to 0.8 from 0.5. 图片2 But when using another workflow without sctransform (mostly pbmc guided tutorial but integrating datasets with harmony), such problem would not happen. Cells in figure below have been assigned to specific cell types. B cell and epithelial clusters separate pretty well. }{_ UT_EO)FBJ33~8MT6Z%7 These results come from integrated analysis of 3 lung cancer samples. This is the code.

##This is the quality control part for single sample, doublets have been removed by doubletfinder
mat=Read10X(path)
seu_object=CreateSeuratObject(counts=mat,min.genes=200,project=sample)
seu_object[["percent.mt"]]=PercentageFeatureSet(seu_object,pattern="^MT-")
seu_object=subset(seu_object,subset=nFeature_RNA>200&percent.mt<20)
seu_object=SCTransform(seu_object, vst.flavor = "v2", verbose = FALSE) %>%
    RunPCA(npcs = 30, verbose = FALSE) %>%
    RunUMAP(reduction = "pca", dims = 1:30, verbose = FALSE)
##This is the integrated analysis
features=SelectIntegrationFeatures(object.list = obj_list, nfeatures = 3000)
obj_list=PrepSCTIntegration(object.list = obj_list, anchor.features = features)
anchors=FindIntegrationAnchors(object.list = obj_list, normalization.method = "SCT",
    anchor.features = features)
combined_object=IntegrateData(anchorset = anchors, normalization.method = "SCT")
combined_object=RunPCA(combined_object,npcs=60,verbose = FALSE)
combined_object=RunUMAP(combined_object,reduction="pca",dims=1:60)
combined_object=FindNeighbors(combined_object,reduction="pca",dims=1:60)
combined_object=FindClusters(combined_object,resolution=0.5)

Are there any mistakes or settings to be changed? Many thanks, FY

saketkc commented 2 years ago

Hi @WuRAFY, would you be able to share the objects and the code to reproduce this with me? My email is schoudhary@nygenome.org

WuRAFY commented 2 years ago

Hi @WuRAFY, would you be able to share the objects and the code to reproduce this with me? My email is schoudhary@nygenome.org

Thank you for your reply, I will talk with the PI to see how much we can share with you.

saketkc commented 2 years ago

Thanks! You can also anonymize the gene names if you wish. Alternatively, does this also happen with LogNormalization + IntegrateData workflow?

WuRAFY commented 2 years ago

Thanks! You can also anonymize the gene names if you wish. Alternatively, does this also happen with LogNormalization + IntegrateData workflow?

Due to time difference, I have to try lognormalize and reply later. Sorry for keeping you waiting.

saketkc commented 2 years ago

No worries, and no rush of course.

WuRAFY commented 2 years ago

Thanks! You can also anonymize the gene names if you wish. Alternatively, does this also happen with LogNormalization + IntegrateData workflow?

PK921G$TTRW()66`(5__51Y

obj_list <- lapply(X = obj_list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
features <- SelectIntegrationFeatures(object.list = obj_list)
anchors <- FindIntegrationAnchors(object.list = obj_list, anchor.features = features)
combined_object <- IntegrateData(anchorset = anchors)
DefaultAssay(combined_object)="integrated"
combined_object=ScaleData(combined_object,verbose=FALSE)
combined_object=RunPCA(combined_object,npcs=61,verbose = FALSE)
combined_object <- RunUMAP(combined_object, reduction = "pca", dims = 1:61)
combined_object <- FindNeighbors(combined_object, reduction = "pca", dims = 1:61)
combined_object <- FindClusters(combined_object, resolution = 0.5)

This is the result and code of lognormalization workflow. The B cell cluster and epithelial do not mix. Later I will organize the code and data then send them to you. Thank you for your patience.

biliopo commented 2 years ago

So what caused this weird problem? sctransform v2 caused? I am so curious about this. @saketkc