satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 902 forks source link

FindMarkers on Merged Object with multiple integration types (IntegrateLayers) #8490

Open gjones3339 opened 7 months ago

gjones3339 commented 7 months ago

Hi, thank you all so much for the wonderful work you do.

I have a few basic questions/issues about the experimental setup below--basically trying to FindMarkers on a merged seurat object that I passed through CCA and Harmony integration.

However, I have a few clusters that aren't integrating well. When I try to findMarkers for those clusters vs the rest, I get this warning: Warning: The following features were omitted as they were not found in the scale.data slot for the RNA assay: RASGRF2, FAM153CP, LINC01250, AC107223.1, KIAA1211L, MEF2C-AS1, ITPR1, TTTY14, MICAL2, AC067956.1, AL391840.1, UTY, USP9Y, AJ009632.2, AC019211.1, SLC22A10, AC073091.3, AL591501.1, AC124852.1, AL117329.1, NLGN4Y, LY86-AS1, AC007368.1

Questions: 1) Based on other issues here, I am wondering if the warning above is related to needing to re-normalize or re-scale my data after running JoinLayers (and before FindMarkers)? I am worried about running this twice though -- can someone explain? 2) Do I need to set Idents() <- 'cca_clusters' (or harmony_clusters) whenever I am switching between analyzing the integrations? 3) If I have multiple integrations, everytime I split or join layers--does this reset my data and necessitate re-normalizing or re-scaling? 4) is it problematic to set merge.data = FALSE when I merge the object?

Thanks

Setup/Code: I have 8 samples (4 conditions x 2 donors) all IPSCs. I pass individual samples through soupX and scDblFinder and then merge into a single object with these parameters: merge(objx, c(objy,z,etc), add.cell.ids = seurat_names, merge.data = FALSE), followed by basic filtering for mito, count, feature, etc.

Based on the tutorial Integrative analysis in Seurat v5 I try to run CCA/Harmony to integrate by donor:

soupxFM_Donor[["RNA"]] <- split(soupxFM_Donor[["RNA"]], f = soupxFM_Donor$Donor)
soupxFM_Donor <- NormalizeData(soupxFM_Donor)
soupxFM_Donor <- FindVariableFeatures(soupxFM_Donor)
soupxFM_Donor <- ScaleData(soupxFM_Donor)
soupxFM_Donor <- RunPCA(soupxFM_Donor)
soupxFM_Donor <- FindNeighbors(soupxFM_Donor, dims = 1:30, reduction = "pca")
soupxFM_Donor <- FindClusters(soupxFM_Donor, resolution = 0.2, cluster.name = "unintegrated_clusters")
soupxFM_Donor <- RunUMAP(soupxFM_Donor, dims = 1:30, reduction = "pca", reduction.name = "umap.unintegrated")

soupxFM_Donor <- IntegrateLayers(
  object = soupxFM_Donor, method = CCAIntegration,
  orig.reduction = "pca", new.reduction = "integrated.cca",
  verbose = FALSE
)
soupxFM_Donor <- FindNeighbors(soupxFM_Donor, reduction = "integrated.cca", dims = 1:30)
soupxFM_Donor <- FindClusters(soupxFM_Donor, resolution = 0.2, cluster.name = "cca_clusters")
soupxFM_Donor <- RunUMAP(soupxFM_Donor, reduction = "integrated.cca", dims = 1:30, reduction.name = "umap.cca")

soupxFM_Donor <- IntegrateLayers(
  object = soupxFM_Donor, method = HarmonyIntegration,
  orig.reduction = "pca", new.reduction = "harmony",
  verbose = FALSE
)
soupxFM_Donor <- FindNeighbors(soupxFM_Donor, reduction = "harmony", dims = 1:30)
soupxFM_Donor <- FindClusters(soupxFM_Donor, resolution = 0.2, cluster.name = "harmony_clusters")
soupxFM_Donor  <-RunUMAP(soupxFM_Donor, reduction = "harmony", dims = 1:30, reduction.name = "umap.harmony")

Then I try to FindMarkers and doHeatmap:

int_joined_donor <- JoinLayers(soupxFM_Donor)
#1, 17, 15, 12 are the non-integrating clusters
int_joined_donor$groupAB <- ifelse(int_joined_donor$cca_clusters %in% c("1", "17", "15", "12"), "Group A", "Group B")
Idents(int_joined_donor) <- 'groupAB'
de_results_AB <- FindMarkers(int_joined_donor, ident.1 = "Group A", ident.2 = "Group B", min.pct = 0.2, logfc.threshold = 0.25)
de_results_sorted_AB <- de_results_AB[order(-de_results_AB$avg_log2FC),]
top_genes_AB <- rownames(head(de_results_sorted_AB, 35))
DoHeatmap(int_joined_donor, features = top_genes_AB)
zskylarli commented 6 months ago

Hi - foremost, it is not appropriate to perform DE testing on integrated values. In terms of your questions:

  1. The warning you're receiving is likely due to the genes being filtered out during preprocessing or if they were not detected as variable features. Additionally, JoinLayers itself should not necessitate re-normalization or re-scaling.

  2. Yes, when you switch between analyzing different integration results (e.g., cca_clusters vs. harmony_clusters), you should set Idents() accordingly, and you can also specify them in group.by parameters when needed.

  3. Typically, splitting or joining layers in itself doesn't reset your data in a way that would require re-normalization or re-scaling.

  4. By default, merge() will combine the Seurat objects based on the raw count matrices, erasing any previously normalized and scaled data matrices. Passing merge.data = F will simply ensure that the normalized data matrices will not be merged as well.

satzel commented 6 months ago

Hi @zskylarli I came across this error message but now wondered if you could maybe elaborate on the sentence "it is not appropriate to perform DE testing on integrated values"? And also could you recommend what kind of downstream analysis would be appropriate for integrated values?

gjones3339 commented 6 months ago

Hi - foremost, it is not appropriate to perform DE testing on integrated values.

So when I try to group my CCA clusters (A and B) to find markers like this: 1, 17, 15, 12 are the non-integrating clusters int_joined_donor$groupAB <- ifelse(int_joined_donor$cca_clusters %in% c("1", "17", "15", "12"), "Group A", "Group B") Idents(int_joined_donor) <- 'groupAB' de_results_AB <- FindMarkers(int_joined_donor, ident.1 = "Group A", ident.2 = "Group B", min.pct = 0.2, logfc.threshold = 0.25)

How do I get FindMarkers to differentiate those two clusters, but on non-integrated values?