Closed jgamache014 closed 3 years ago
Following up, I am able to run FindMarkers() using the scale.data slot, but only the 3,000 most variable genes are included, so the DE analysis is very limited. For context, here is my whole workflow:
#Merge multiple Seurat objects
list <- c(sample2, sample3, ... sample24)
combined <- merge(x = sample1, y = list)
#Reference-based integration
combined.list <- SplitObject(combined, split.by = "sampID")
for (i in 1:length(combined.list)) {
combined.list[[i]] <- SCTransform(combined.list[[i]], method = 'glmGamPoi', verbose = FALSE)
}
combined.features <- SelectIntegrationFeatures(object.list = combined.list, nfeatures = 3000)
combined.list <- PrepSCTIntegration(object.list = combined.list, anchor.features = combined.features)
reference_dataset <- which(names(combined.list) == "sample1")
combined.anchors <- FindIntegrationAnchors(object.list = combined.list, normalization.method = "SCT",
anchor.features = combined.features, reference = reference_dataset)
combined.integrated <- IntegrateData(anchorset = combined.anchors, normalization.method = "SCT")
The number of features in my combined object is 36,601. When I use return.only.var.genes = FALSE
in SCTransform() and nfeatures = length(rownames(combined))
in SelectIntegrationFeatures(), the number of features in combined.integrated[["SCT"]]@scale.data after integration was increased only to 6,671.
Next I tried features.to.integrate = all.genes
in IntegrateData(), but I get an error:
all.genes <- rownames(combined)
combined.integrated <- IntegrateData(anchorset = combined.anchors, features.to.integrate = all.genes, normalization.method = "SCT")
##Calculating residuals of type pearson for 14651 genes
##Integrating dataset 1 with reference dataset
##Finding integration vectors
##Finding integration vector weights
##0% 10 20 30 40 50 60 70 80 90 100%
##[----|----|----|----|----|----|----|----|----|----|
##**************************************************|
##Integrating data
##Error in .cbind2Csp(x, y) :
##Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 92
##Calls: IntegrateData ... cbind -> cbind2 -> cbind2 -> cbind2sparse -> .cbind2Csp
##Execution halted
Any tips for how I can get data for all 36,601 features in the scale.data slot of my integrated object?
Thanks a lot!
This is really a Seurat integration issue/question. Please head over to https://github.com/satijalab/seurat/issues and search for similar issues there.
I read in this vignette under "Where are normalized values stored for sctransform?" that it is recommended to perform integration and DE calculations directly on the residuals stored in the scale.data slot of the SCT assay. Why is this, and are the residuals specified if this SCT integration vignette is followed? It seems like the PrepSCTIntegration function might accomplish this, but it's not clear to me.
For DE analysis, would it be appropriate to using something like
FindMarkers(object, ident.1 = "group1", ident.2 = "group2", assay = 'SCT', slot = 'scale.data', test.use = "MAST")
to specify the residuals?