satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

PrepSCTIntegration scale.data error #3085

Closed pbschwar closed 4 years ago

pbschwar commented 4 years ago

Hello Seurat Team,

Fairly new to this, so I apologize if this has been addressed before. Also, thank you for such a good program. I had originally integrated two datasets (2 different conditions) using the standard workflow, and there were no issues. I am now attempting to integrate them following SCTransform and I get the following error while running the PrepSCTIntegration function:

"Error in scale.data[anchor.features, ] : subscript out of bounds"

My workflow thus far is:

  1. Create Seurat Object for each dataset
  2. Run SCTransform to regress out percent.mt and CC.Difference
  3. Merge the datasets
  4. Create a list by each condition
  5. Set "SCT" as default assay
  6. SelectIntegrationFeatures
  7. Followed by PrepSCTIntegration

Code below: seuratNC <- CreateSeuratObject(counts = countsNC, project = "NC KC", assay = "RNA",min.cells = 0, min.features = 200) seuratCD <- CreateSeuratObject(countsCD, project = "CD KC", assay = "RNA", min.cells = 0, min.features = 200) seuratNC[["percent.mt"]] <- PercentageFeatureSet(seuratNC, pattern = "^mt-") seuratCD[["percent.mt"]] <- PercentageFeatureSet(seuratCD, pattern = "^mt-")

Remove doublets

seuratNC <- subset(seuratNC, subset = nFeature_RNA < 6000) seuratCD <- subset(seuratCD, subset = nFeature_RNA < 6000) seuratNC <- NormalizeData(object = seuratNC) seuratNC <- CellCycleScoring(seuratNC , s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) seuratNC$CC.Difference <- seuratNC$S.Score - seuratNC$G2M.Score seuratNC <- SCTransform(seuratNC, vars.to.regress = c("percent.mt","CC.Difference"), verbose = T) seuratCD <- NormalizeData(object = seuratCD) seuratCD <- CellCycleScoring(seuratCD, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) seuratCD$CC.Difference <- seuratCD$S.Score - seuratCD$G2M.Score seuratCD <- SCTransform(seuratCD, vars.to.regress = c("percent.mt","CC.Difference"), verbose = T) kc.combined <- merge(seuratNC_umap, y = seuratCD_umap, add.cell.ids = c("NC", "CD"), project = "KC C") DefaultAssay(kc.combined) <- "SCT" kc.list <- SplitObject(kc.combined, split.by = "orig.ident") integ_features <- SelectIntegrationFeatures(kc.list, nfeatures = 3000) kc.list <- PrepSCTIntegration(object.list = kc.list ,anchor.features = integ_features, verbose=T, assay = "SCT")

Thanks for your help

pbschwar commented 4 years ago

As an update, I had initially kept the conditions separate prior to merging and integration to be able to individually evaluate. If I merged and created a list prior to running SCTransform this seemed to have fixed whatever error occurred. Is creation of the list necessary prior to running SCTransform when you intend to integrate the datasets?

Thanks!

yuhanH commented 4 years ago

Hi, You are right. In the SCTransform integration vignette, we split merged object into a list, and run SCTransform for each individual object in the list. When you run SCTransform, it will store the sct modal in the object. Then, when you merge those SCTransform normalized object, the merged object has a list of models stored. However, next, when you use SplitObject for the merged object, the sct modals may not be split correctly.

denvercal1234GitHub commented 2 years ago

@pbschwar - Did you mean you perform SCTranform taking a list of 2 objects that you each SCTransform as input?

greengarden0925 commented 9 months ago

seuratNC <- SCTransform(seuratNC, vars.to.regress = c("percent.mt","CC.Difference"), verbose = T)

This error happended because the some genes can not be found in the scale.data, which are done using "SCTransform". The default setting of the feature number is 3000 in "SCTransform" function as shown below. You should change "variable.features.n" to the total number of gene features. ## Default S3 method: SCTransform( object, cell.attr, reference.SCT.model = NULL, do.correct.umi = TRUE, ncells = 5000, residual.features = NULL, **variable.features.n = 3000,** variable.features.rv.th = 1.3, vars.to.regress = NULL, do.scale = FALSE, do.center = TRUE, clip.range = c(-sqrt(x = ncol(x = umi)/30), sqrt(x = ncol(x = umi)/30)), vst.flavor = "v2", conserve.memory = FALSE, return.only.var.genes = TRUE, seed.use = 1448145, verbose = TRUE, ... ) You can revise your script to the following and try agian. Do the same revision on seuratCC as well seuratNC <- SCTransform(seuratNC, vars.to.regress = c("percent.mt","CC.Difference"), variable.features.n =seuratNC ,verbose = T)

Hope this will work. However, I tried this is not enough to avoid the error. The following is my script using my data "dtlist" is a list of multiple seurat object. Before the bug being corrected, you can using the following code to get the correct "Anchor.features" that mapping to the gene feature after "GetResidual" function.

`##find integration gene features intfts <- SelectIntegrationFeatures(object.list = dtlist, nfeatures = nrow(sce)) #nrow(sce) 10549 length(c) #10499

get the intersect gene list

getAnchor.features=function(obj){

obj=dtlist[[1]]

obj <- GetResidual(object = obj, assay = assay[i], features = anchor.features, replace.value = ifelse(test = is.null(x = sct.clip.range), yes = FALSE, no = TRUE), clip.range = sct.clip.range, verbose = FALSE)

scale.data <- GetAssayData(object = obj, assay = "SCT", slot = "scale.data")

row.names(scale.data) }

c=lapply(dtlist,getAnchor.features) d=Reduce(intersect, c(c,list(intfts=intfts))) length(d) #10449

dtlist <- PrepSCTIntegration(object.list = dtlist, anchor.features = d )`