Closed MrModenait closed 3 years ago
SCTransform
performs a filtering by default it only keeps genes that are expressed in >3 cells. What is the expression pattern of CCL2 in your case (for both the objects)? Also, nFeature_RNA
and nCount_RNA
need (and should) not be regressed given the SCT model implicitly accounts for these.
Dear @saketkc,
It turns out the question was once again made irrelevant as I opted for query-based integration from Mapping and annotating query datasets, and I soon discovered that not a single cell in the dataset expressed the CCL2 gene.
Does your answer imply that if, after running SCTransform
with residual.features = NULL
and after integrating with features.to.integrate = all_genes
, a gene is missing from the scale.data
slot of the SCT
assay, that gene was expressed in 3 cells or fewer?
Best Regards, Gustav C.
P.S. As for vars.to.regress
, I was not aware of this: thank you for pointing it out!
Does your answer imply that if, after running SCTransform with residual.features = NULL and after integrating with features.to.integrate = all_genes, a gene is missing from the scale.data slot of the SCT assay, that gene was expressed in 3 cells or fewer?
Yes, but only if return.only.var.genes=F
is set. Otherwise, it will only use the variable genes to create the scale.data slot.
Hi @saketkc , I'm running into a similar problem where a would like to pass a SCT normalized and integrated matrix into other methods. However I would like to keep all genes in the original counts matrices. Is there a way to disable the SCTransform() filtering of genes expressed in fewer than (I believe default) 5 cells? What I'm doing:
seu_obj_list<-lapply(seu_obj_list, function(x){ temp<-SCTransform(x, return.only.var.genes=FALSE, method="offset", min_cells=0); return(temp)})
FTI<-lapply(seu_obj_list, function(x) rownames(x[["SCT"]]@scale.data))
FTI<-unique(unlist(FTI))
The above returns a seurat object list where the '[["SCT"]]@scale.data' slot has the same dimensions as the raw counts matrices. But, when I try to integrate both data-sets I get an error:
feat<-SelectIntegrationFeatures(seu_obj_list)
seu_obj_list<-PrepSCTIntegration(seu_obj_list, anchor.features=feat)
seu_anchors<-FindIntegrationAnchors(seu_obj_list,feat,normalization.method="SCT")
seu_combined<-IntegrateData(seu_anchors, normalization.method="SCT",features.to.integrate=FTI)
Gives me:
...
Merging dataset 2 into 1
Extracting anchors for merged samples
Finding integration vectors
Error in h(simpleError(msg,call)) :
error in evaluating the argument 'x' in selecting a method for function 't': invalid character indexing
In addtion: Warning messages:
1: Residuals not computed for the following requested features: ......
And I get 2000 features with no residuals
Any ideas?
Thanks!
I am not sure if retaining those genes help given they are expressed in so few cells, but you can do `SCTransform(object, method="glmGamPoi", min_cells=1).
Also we show in our recent preprint that offset
model does not lead to optimal VST, curious about your choice of model here.
Do all the objects in your object list have the same genes to start off with? I would suggest you create a new issue to follow up.
@saketkc Sorry for the delayed response. I wasn't tagged so your answer never popped up
I chose the offset method based on their paper https://www.biorxiv.org/content/10.1101/2020.12.01.405886v3.abstract However, thank you for pointing me out to your pre-print!
All objects in my list have the exact same rows. I will create a new issue Thanks
I wanted to chime in here for people who are interested in setting variable features in SCTransform. There is an argument in the SCTransform function called return.only.var.genes
that you can set to get the scaled data for all features that SCTransform calculates residuals for (ie the same genes in the data slot). SCTransform calculates residuals for all genes expressed in more than 3 cells, I think. So this solution does not work if you want to use a gene expressed in 3 or less cells as a varaible feature, but I think that would be a bad idea anyway. Once all of these genes are returned, the scale.data matrix can be subset by the genes you want as variable features and you can set the var.features slot to the genes that you want. Example below where desired_var_features is your vector of desired variable features (using Seurat version 4.3.0.1 and R version 4.3.1).
obj = SCTransform(obj, return.only.var.genes = F)
obj@assays$SCT@scale.data = obj@assays$SCT@scale.data[desired_var_features,]
obj@assays$SCT@var.features = desired_var_features
Dear colleagues,
I would like to reopen an old issue from January (#3916). Back then, @jgamache014 asked how to obtain all features from datasets in an integrated SCTransformed object for DE analysis. The question was never answered, as @HerpelinckT pointed out that DE analysis should be done on the
RNA
assay, and the question because obsolete before it was answered. However, in integrating two of my datasets, I would still like to choose specific genes to include in my integrated dataset. Briefly, I need to keep the gene expression data for the famous chemoattractant CCL2 to find which cluster has the highest CCL2 expression. Selecting custom genes to keep used to be possible withScaleData
(as I showed in January). How can I do this withSCTransform
? Here's a walkthrough of the problem. Both datasets have 33,538 features in the Counts and the Seurat object (usingmin.cells = 0
forCreateSeuratObject
), and CCL2 is included in these. I then proceed to runSCTransform
on the list:After
SCTransform
, the first dataset has 12,499 genes while the second dataset has 9,106 genes, counted usingrownames
, and CCL2 is missing in both. The hurdle can be overcome by settingresidual.features = all_genes
inSCTransform
, whereall_genes
is a character vector of my 33,538 features calculated using:This yields 33,538 genes in the SCTransformed list, however trouble begins as I integrate the data:
The integration fails and the message I get is:
CCL2 is one of the genes in the long list at the bottom that was "not present in any models". It is worth noting that if I set
features.to.integrate = NULL
, the integration succeeds but I only obtain 3000 genes of which CCL2 is not included. I would like to make it clear that I want my integration to be determined by the default anchor/variable features, I just want to carry CCL2 along to plot it on the final UMAP usingFeaturePlot
. Any suggestions on how I may overcome the error message, or how Imay achieve the same result?Best Regards, Gustav C.