satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 904 forks source link

NK cell sample doesn't show NCAM1 expression #5193

Closed erguntiryaki closed 2 years ago

erguntiryaki commented 2 years ago

Hello,

I am practicing with 10x Genomics' NK cell dataset (with 92% purity). NK cells were sorted from peripheral blood sample based on CD56 surface marker via FACS. I see that very low proportion of NK cells (~4 %) express NCAM1 gene which encodes CD56 protein. It doesn't make sense to me, is there any possible biological explanation or computational aspect that I did wrong? Thank you so much in advance.

Dataset that I used Code to test number of NCAM1 expressing cells: ncol(subset(seuratobj, subset = NCAM1 > 0))

Complete workflow to process dataset:

Loading Dataset

nk.counts <- ReadMtx(mtx = "./data/raw/matrix.mtx", cells = "./data/raw/barcodes.tsv", 
                  features = "./data/raw/genes.tsv", cell.sep = "\t", feature.sep = "\t")
seuratobj <- CreateSeuratObject(counts = nk.counts, min.cells = 3, min.features = 200, project= "10X_NK")

QC

seuratobj[["percent.mt"]] <- PercentageFeatureSet(seuratobj, pattern = "^MT-")
VlnPlot(seuratobj, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
FeatureScatter(seuratobj, feature1 = "nCount_RNA", feature2 = "percent.mt")
FeatureScatter(seuratobj, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
seuratobj <- subset(seuratobj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) # Seurat Defaults

Normalization

seuratobj <- NormalizeData(seuratobj)

HVG

seuratobj <- FindVariableFeatures(seuratobj, selection.method = "vst", nfeatures = 1500)
top10 <- head(VariableFeatures(seuratobj), 10)

p1 <- VariableFeaturePlot(seuratobj)
LabelPoints(plot = p1, points = top10, repel = TRUE)

Scaling

all.genes <- rownames(seuratobj)
seuratobj <- ScaleData(seuratobj, features = all.genes)

PCA

seuratobj <- RunPCA(seuratobj, features = VariableFeatures(object = seuratobj)) ElbowPlot(seuratobj)

Clustering

seuratobj <- FindNeighbors(seuratobj, dims = 1:15)
seuratobj <- FindClusters(seuratobj)

UMAP

seuratobj <- RunUMAP(seuratobj, dims = 1:15)
DimPlot(seuratobj, reduction = "umap")

Check NCAM1 expression

VlnPlot(cd56, features = "NCAM1")

Number of NCAM1 expressing cells

ncol(subset(seuratobj, subset = NCAM1 >1))

322 out of 8302 => ~ 4 %

mhkowalski commented 2 years ago

I would use ncol(subset(seuratobj, subset = NCAM1 >0)) . However, in this case it appears you also get 322 cells. Otherwise, your analysis looks correct. A feature of single cell RNA sequencing is its high drop out rate, meaning that you might not detect a gene for a particular cell, even if that cell is expressing the gene.

erguntiryaki commented 2 years ago

@mhkowalski Thank you for your valuable explanation. 1- I have reevaluated the data after your comment and found that subsetting seurat object whether >0 or >1 on NCAM1 expression doesn't change the results because there is no cell in 0-1 range (you can find the regarding violin plot, below). However, you are completely right about the dropout effect.

2- I found this article which adresses this issue on 10X pbmc dataset. Also, authors specifically investigate the very low NCAM1 mRNA expression in CD56+ NK cells. I think this article is a good reference for this question.

0-1-NCAM1