satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 905 forks source link

Pearson correlation matrix for multiple genes #6460

Closed pbpayal closed 2 years ago

pbpayal commented 2 years ago

Hello,

I want to create Pearson correlation matrix of a set of genes per cluster basis. I want to see how these set of genes are co-expressed in all clusters.

Something like this(Referenced from Deseq2 manual)

Screen Shot 2022-09-23 at 12 20 55 PM

I know about FeatureScatter but that's only one gene against another. How to do for multiple genes? Also which data slot is appropriate for this kind of analysis?

Thanks, Payal

k3yavi commented 2 years ago

Hi @pbpayal ,

You can use the AverageExpression function on the Seurat object and then run cor function from base R to generate the correlation matrix.

pbpayal commented 2 years ago

I have an integrated Seurat object with all the preprocessing done and I have annotated clusters. The weird thing is, when I want to extract the set of genes that i am interested in from AverageExpression, it says Warning: The counts slot for the integrated assay is empty. Skipping assay. I tried all - "counts","data","scale.data" slot. None of them work.

However, when I do DoHeatmap on "counts" slot for the same set of genes, it's able to plot a Heatmap. Do you have any idea, why that might be the case? For DoHeatmap, is the average expression of each gene getting plotted? Can I use that data for correlation analysis?

k3yavi commented 2 years ago

Can you share the code you are using to test the analysis?

pbpayal commented 2 years ago
#CTRL
sample_1 <- Read10X(data.dir = "../filtered_feature_bc_matrix")
sample_1_sobj <- CreateSeuratObject(counts = sample_1, project = "sample_1", min.cells = 3, min.features = 200)
sample_1_sobj$stim <- "CTRL"
sample_1_sobj[["percent.mt"]] <- PercentageFeatureSet(sample_1_sobj, pattern = "^MT-")
sample_1_sub <- subset(sample_1_sobj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5 )
sample_1_sub <- NormalizeData(sample_1_sub, verbose = FALSE)
sample_1_sub <- FindVariableFeatures(sample_1_sub, selection.method = "vst", nfeatures = 2000)
# EXP
sample_2 <- Read10X(data.dir = "../filtered_feature_bc_matrix")
sample_2_sobj <- CreateSeuratObject(counts = sample_2, project = "sample_2", min.cells = 3, min.features = 200)
sample_2_sobj$stim <- "EXP"
sample_2_sobj[["percent.mt"]] <- PercentageFeatureSet(sample_2_sobj, pattern = "^MT-")
sample_2_sub <- subset(sample_2_sobj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5 )
sample_2_sub <- NormalizeData(sample_2_sub, verbose = FALSE)
sample_2_sub <- FindVariableFeatures(sample_2_sub, selection.method = "vst", nfeatures = 2000)

I did this for 8 samples - 4 control and 4 experiment samples then integrated the CTRL and EXP samples

immune.anchors <- FindIntegrationAnchors(object.list = list(sample1_sub, sample2_sub,..,sample8_sub), dims = 1:20)
immune.combined.integrated <- IntegrateData(anchorset = immune.anchors.2, dims = 1:20)
DefaultAssay(immune.combined.integrated) <- "integrated"
# Run the standard workflow for visualization and clustering
immune.combined.integrated <- ScaleData(immune.combined.integrated, verbose = FALSE)
immune.combined.integrated <- RunPCA(immune.combined.integrated, npcs = 20, verbose = FALSE)
immune.combined.integrated <- RunUMAP(immune.combined.integrated, reduction = "pca", dims = 1:20)
immune.combined.integrated <- FindNeighbors(immune.combined.integrated, reduction = "pca", dims = 1:20)
# Resolution
immune.combined.integrated <- FindClusters(immune.combined.integrated, resolution = 0.2)

# Annotate clusters
new.cluster.ids <- c("Erythroid cells", "Memory Tcells/Tcells", "DC/Monocyte",
                     "NK" ,"Bcells", "CD8 Tcells","MAIT T cell",
                    "Plasma","Plasmacytoid DC", "RBC","Platelets")
names(new.cluster.ids) <- levels(immune.combined.integrated)
immune.combined.integrated_ <- RenameIdents(immune.combined.integrated, new.cluster.ids)
# Rename the seurat clusters in metadata
immune.combined.integrated[["seurat_clusters"]] <- Idents(object = immune.combined.integrated)

features_c1_subunit <- c("MT-ND1","MT-ND2","MT-ND3","MT-ND4","MT-ND4L","MT-ND5",
                         "MT-ND6","NDUFA1","NDUFA2","NDUFA6","NDUFA9","NDUFA10",
                         "NDUFA11","NDUFA12","NDUFA13","NDUFB3","NDUFB8","NDUFB9",
                         "NDUFB10","NDUFB11","NDUFS1","NDUFS2","NDUFS3","NDUFS4",
                         "NDUFS6","NDUFS7","NDUFS8","NDUFV1","NDUFV2")

DefaultAssay(immune.combined.integrated) <- "RNA"

heatmap <- DoHeatmap(immune.combined.integrated_1, features = features_c1_subunit, slot = "counts", size = 3.5)
avg_exp <- AverageExpression(immune.combined.integrated_1, slot = "data", features = features_c1_subunit)