Question about using residuals for integration & DE analysis

jgamache014 commented 3 years ago

I read in this vignette under "Where are normalized values stored for sctransform?" that it is recommended to perform integration and DE calculations directly on the residuals stored in the scale.data slot of the SCT assay. Why is this, and are the residuals specified if this SCT integration vignette is followed? It seems like the PrepSCTIntegration function might accomplish this, but it's not clear to me.

For DE analysis, would it be appropriate to using something like FindMarkers(object, ident.1 = "group1", ident.2 = "group2", assay = 'SCT', slot = 'scale.data', test.use = "MAST") to specify the residuals?

jgamache014 commented 3 years ago

Following up, I am able to run FindMarkers() using the scale.data slot, but only the 3,000 most variable genes are included, so the DE analysis is very limited. For context, here is my whole workflow:

#Merge multiple Seurat objects
list <- c(sample2, sample3, ... sample24)
combined <- merge(x = sample1, y = list)

#Reference-based integration
combined.list <- SplitObject(combined, split.by = "sampID")
for (i in 1:length(combined.list)) {
  combined.list[[i]] <- SCTransform(combined.list[[i]], method = 'glmGamPoi', verbose = FALSE)
}

combined.features <- SelectIntegrationFeatures(object.list = combined.list, nfeatures = 3000)
combined.list <- PrepSCTIntegration(object.list = combined.list, anchor.features = combined.features)

reference_dataset <- which(names(combined.list) == "sample1")

combined.anchors <- FindIntegrationAnchors(object.list = combined.list, normalization.method = "SCT", 
                                           anchor.features = combined.features, reference = reference_dataset)
combined.integrated <- IntegrateData(anchorset = combined.anchors, normalization.method = "SCT")

The number of features in my combined object is 36,601. When I use return.only.var.genes = FALSE in SCTransform() and nfeatures = length(rownames(combined)) in SelectIntegrationFeatures(), the number of features in combined.integrated[["SCT"]]@scale.data after integration was increased only to 6,671.

Next I tried features.to.integrate = all.genes in IntegrateData(), but I get an error:

all.genes <- rownames(combined)
combined.integrated <- IntegrateData(anchorset = combined.anchors, features.to.integrate = all.genes, normalization.method = "SCT")

##Calculating residuals of type pearson for 14651 genes
##Integrating dataset 1 with reference dataset
##Finding integration vectors
##Finding integration vector weights
##0%   10   20   30   40   50   60   70   80   90   100%
##[----|----|----|----|----|----|----|----|----|----|
##**************************************************|
##Integrating data
##Error in .cbind2Csp(x, y) : 
  ##Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 92
##Calls: IntegrateData ... cbind -> cbind2 -> cbind2 -> cbind2sparse -> .cbind2Csp
##Execution halted

Any tips for how I can get data for all 36,601 features in the scale.data slot of my integrated object?

Thanks a lot!

ChristophH commented 3 years ago

This is really a Seurat integration issue/question. Please head over to https://github.com/satijalab/seurat/issues and search for similar issues there.

satijalab / sctransform

Question about using residuals for integration & DE analysis #90