satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.31k stars 920 forks source link

Ubiquitous antibody expression/CITE-seq processing #9514

Closed oshiomah1 closed 6 days ago

oshiomah1 commented 6 days ago

Hello I used seurat to proess a CITE-seq dataset, upon looking at antibody expression, the results are somewhat strange. That is most antibodies are expressed by most cell types/clusters. Although the intensity of that expression is expected. For example "Hu.CD19" antibody should only be epressed by B cells. In the plot below we see that all cells express "Hu.CD19" but the B cells have the highest avergae expression compared to other cell types. I am not sure of this is due to an error in processing the data or a lack of understanding how the wet lab or computational processing works to get this result. Any help is appreciated. Note that this phenomenon doesn't occur with the gene expression data

Screenshot 2024-11-27 at 12 58 26

Here is the code below

testrun <- Read10X("/share/hennlab/projects/NCR_scRNAseq/results/TEStmulti_analysis_GEX2_CSF/outs/per_sample_outs/TEStmulti_analysis_GEX2_CSF/count/sample_filtered_feature_bc_matrix")
print("file is loaded")
#str(testrun)

# Create Seurat object for gene expression
seurat_obj <- CreateSeuratObject(counts = testrun$`Gene Expression`, project = "CITEseq_Project")

# Add the antibody capture data as a new assay
seurat_obj[["ADT"]] <- CreateAssayObject(counts = testrun$`Antibody Capture`)

# Calculate mitochondrial percentage
seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-")

#sbasic filtering
seurat_obj <- subset(seurat_obj, subset = nFeature_RNA > 200 & nFeature_RNA < upper_bound_rna_nfeature & percent.mt < 5 &  nCount_ADT < upper_bound_ncount_adt)

# Process RNA data with standard workflow
seurat_obj <- NormalizeData(seurat_obj, assay = "RNA", normalization.method = "LogNormalize") %>%
  FindVariableFeatures(assay = "RNA", selection.method = "vst", nfeatures = 2000) %>%    # Specify RNA assay for variable feature selection
  ScaleData(assay = "RNA") %>%               # Specify RNA assay for scaling
  RunPCA(assay = "RNA",reduction.name = "pca", reduction.key = "PCA_") %>% # Specify RNA assay for PCA
  FindNeighbors(assay = "RNA", dims = 1:20)  %>%
  FindClusters(assay = "RNA")  %>%
  RunUMAP(reduction = 'pca', dims = 1:20, assay = 'RNA', 
              reduction.name = 'rna.umap', reduction.key = 'rnaUMAP_')
print("processed RNA data")

# Process ADT data
seurat_obj <- NormalizeData(seurat_obj, assay = "ADT", normalization.method = "CLR",margin = 2) %>%
  ScaleData(assay = "ADT")

# Run PCA on all ADT features
adt_features <- rownames(seurat_obj[["ADT"]])  # Get all ADT features
seurat_obj <- RunPCA(seurat_obj, assay = "ADT", features = adt_features, reduction.name = "apca", reduction.key = "APCA_") %>%
  FindNeighbors(assay = "ADT", dims = 1:20) %>%
  FindClusters(assay = "ADT") %>%
  RunUMAP(reduction = 'apca', dims = 1:20, assay = 'ADT', 
          reduction.name = 'adt.umap', reduction.key = 'adtUMAP_')
print("processed ADT data")

# Multi-modal integration
seurat_obj <- FindMultiModalNeighbors(seurat_obj, reduction.list = list("pca", "apca"), 
                                      dims.list = list(1:20, 1:20), modality.weight.name = c("RNA.weight", "ADT.weight")) %>%
  RunUMAP(nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_") %>%
  FindClusters(graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)

DotPlot(seur_obj, features = tcell_rna_markers, group.by = "sctype_classification", cols = my_colors) + RotatedAxis()  
samuel-marsh commented 6 days ago

Hi,

There are potentially several issues at work here. I also want to note I’m going to transfer this to discussions as it isn’t issue with how Seurat package is functioning.

First, in your plot you are plotting scaled expression (hence -2 - +2. So it can be hard to evaluate true levels of expression using this visualization alone.

Second, even in good scenarios there can be significant no -negative counts in CITE-seq data that can occur for variety of reasons some can be optimized on wrt lab side but some may persist regardless. You can look up the CellBender paper which shows the issue of background CITE-seq reads as well as their attempts to computationally remove some of the issues.

Best, Sam