Closed StefaniAstrologo closed 3 years ago
sctransform::get_residuals()
takes an sctransform-learned model and a count matrix and returns the residuals as a matrix with the same dimensions as the input count matrix.
sctransform::get_residual_var()
returns the variance of each row of that matrix without creating the full matrix.
Why can you not call these functions anymore with Seurat Object 4.0.1?
As to the Seurat::GetResidual()
function. If I understand correctly, this can be used to update the scale.data to add features that might not be present (by default only variable features are in scale.data).
One important thing to keep in mind when calling sctransform functions directly is that the Seurat wrapper might use a different clipping range for the residuals. The Seurat::SCTransform
default is clip.range = c(-sqrt(n_cells/30), sqrt(n_cells/30))
that would also be used by GetResidual()
, while the sctransform::vst()
default is wider at c(-sqrt(n_cells), sqrt(n_cells))
(here n_cells
are the number of cells input when initially normalising the data).
Hi @ChristophH, sorry for the late reply. Below I tried to be a bit more specific, and I really hope you can help me :)
Why can you not call these functions anymore with Seurat Object 4.0.1?
I think because the structure of the Seurat Object changed.
# Old SeuratObj
> glimpse(so_old@assays$SCT)
Formal class 'Assay' [package "Seurat"] with 8 slots
..@ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ scale.data : num [1:3000, 1:4481] -0.2961 -0.0411 -0.2345 -0.1305 -0.2621 ...
.. ..- attr(*, "dimnames")=List of 2
..@ key : chr "sct_"
..@ var.features : chr [1:3000] "Cd74" "Dcn" "Gsn" "Fabp4" ...
..@ meta.features:'data.frame': 15175 obs. of 6 variables:
.. ..$ sct.detection_rate : num [1:15175] 0.0375 0.2647 0.1767 0.3963 0.0196 ...
.. ..$ sct.gmean : num [1:15175] 0.0478 0.2451 0.1459 0.4248 0.0157 ...
.. ..$ sct.variance : num [1:15175] 1.2979 0.5071 0.2587 1.0205 0.0371 ...
.. ..$ sct.residual_mean : num [1:15175] 0.1938 -0.0189 0.0168 -0.0766 -0.0334 ...
.. ..$ sct.residual_variance: num [1:15175] 9.983 0.846 1.015 0.72 0.707 ...
.. ..$ sct.variable : logi [1:15175] TRUE FALSE FALSE FALSE FALSE FALSE ...
..@ misc :List of 1
.. ..$ vst.out:List of 12
..@ NA : NULL
# The vst_out is in so_old@assays$SCT@misc$vst.out (See below)
> glimpse(so_old@assays$SCT@misc$vst.out)
List of 12
$ model_str : chr "y ~ log_umi"
$ model_pars : num [1:2000, 1:3] 2.879 0.324 2.364 11.712 6.797 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2000] "Malat1" "Amot" "Dlg1" "Tpt1" ...
.. ..$ : chr [1:3] "theta" "(Intercept)" "log_umi"
$ model_pars_outliers : logi [1:2000] FALSE FALSE FALSE FALSE FALSE FALSE ...
$ model_pars_fit : num [1:15175, 1:3] 0.57 1.422 1.267 1.712 0.241 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:15175] "Sox17" "Mrpl15" "Lypla1" "Tcea1" ...
.. ..$ : chr [1:3] "theta" "(Intercept)" "log_umi"
..- attr(*, "outliers")= logi [1:2000] FALSE FALSE FALSE FALSE FALSE FALSE ...
$ model_str_nonreg : chr ""
[...]
#------------------------------------------------------------------------------
# New SeuratObj
> glimpse(SeuratObj@assays$SCT)
Formal class 'SCTAssay' [package "Seurat"] with 9 slots
..@ SCTModel.list:List of 1
.. ..$ model1:Formal class 'SCTModel' [package "Seurat"] with 6 slots
..@ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ scale.data : num [1:3000, 1:2498] -0.114 -0.695 -0.294 -0.128 -0.872 ...
.. ..- attr(*, "dimnames")=List of 2
..@ key : chr "sct_"
..@ assay.orig : chr "RNA"
..@ var.features : chr [1:3000] "Cxcl14" "Car3" "Hspa1a" "Pi16" ...
..@ meta.features:'data.frame': 14748 obs. of 0 variables
..@ misc : Named list()
# The vst_out is in SeuratObj@assays$SCT@SCTModel.list$model1 (See below)
> glimpse(SeuratObj@assays$SCT@SCTModel.list$model1)
Formal class 'SCTModel' [package "Seurat"] with 6 slots
..@ feature.attributes:'data.frame': 14748 obs. of 12 variables:
.. ..$ detection_rate : num [1:14748] 0.0152 0.3066 0.3795 0.5813 0.2886 ...
.. ..$ gmean : num [1:14748] 0.0178 0.2811 0.3854 0.7522 0.2723 ...
.. ..$ variance : num [1:14748] 0.255 0.48 0.729 1.592 0.543 ...
.. ..$ residual_mean : num [1:14748] 0.12926 0.02881 -0.00228 -0.02117 0.00658 ...
.. ..$ residual_variance : num [1:14748] 7.577 0.989 0.8 0.759 0.924 ...
.. ..$ theta : num [1:14748] 0.197 2.61 2.717 2.571 2.586 ...
.. ..$ (Intercept) : num [1:14748] -9.87 -10.02 -9.58 -8.39 -10.06 ...
.. ..$ log_umi : num [1:14748] 1.49 2.19 2.15 2.04 2.19 ...
.. ..$ genes_log_gmean_step1: logi [1:14748] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. ..$ step1_theta : num [1:14748] NA NA NA NA NA NA NA NA NA NA ...
.. ..$ step1_(Intercept) : num [1:14748] NA NA NA NA NA NA NA NA NA NA ...
.. ..$ step1_log_umi : num [1:14748] NA NA NA NA NA NA NA NA NA NA ...
..@ cell.attributes :'data.frame': 2498 obs. of 3 variables:
.. ..$ umi : num [1:2498] 10291 17520 12944 19593 14594 ...
.. ..$ log_umi : num [1:2498] 4.01 4.24 4.11 4.29 4.16 ...
.. ..$ cells_step1: logi [1:2498] TRUE TRUE TRUE TRUE TRUE TRUE ...
..@ clips :List of 2
.. ..$ vst: num [1:2] -50 50
.. ..$ sct: num [1:2] -9.13 9.13
..@ umi.assay : chr "RNA"
..@ model : chr "y ~ log_umi"
[...]
If I use sctransform::get_residual_var() on the 'old SeuratObj':
so_old <- readRDS('Mammary_Gland_so.rds')
vst_out_Old <- so_old@assays$SCT@misc$vst.out
umi_Old <- so_old@assays$RNA@counts
Res_Old <- sctransform::get_residual_var(vst_out_Old,umi_Old, verbosity = 0)
Res_Old[1:10]
Sox17 Mrpl15 Lypla1 Tcea1 Rgs20 Atp6v1h Rb1cc1 Pcmtd1 Sntg1 Rrs1
9.9832401 0.8459077 1.0145507 0.7202136 0.7066688 0.9650393 1.0909149 1.2164037 1.0011648 1.1864490
Instead, if I do the same on the 'new SeuratObj':
SeuratObj <- readRDS('Mammary_Gland_so_new.rds')
vst_out_New <- SeuratObj@assays$SCT@SCTModel.list$model1
umi_New <- SeuratObj@assays$RNA@counts
sctransform::get_residual_var(vst_out_New,umi_New, verbosity = 0)
Error: $ operator not defined for this S4 class
link for the two objects >>>> https://we.tl/t-DNcIMzwq5s
One important thing to keep in mind when calling sctransform functions directly is that the Seurat wrapper might use a different clipping range for the residuals. The Seurat::SCTransform default is clip.range = c(-sqrt(n_cells/30), sqrt(n_cells/30)) that would also be used by GetResidual(), while the sctransform::vst() default is wider at c(-sqrt(n_cells), sqrt(n_cells)) (here n_cells are the number of cells input when initially normalising the data).
Thanks this is very useful to know!
Hi @ChristophH just to close this issue. This is how I have changed the function sctransform::get_residual_var() in order to use it with the new obj.
GetResidualVar <- function(vst_out, umi, residual_type = "pearson",
res_clip_range = c(-sqrt(ncol(umi)), sqrt(ncol(umi))),
min_variance = vst_out@arguments$min_variance,
cell_attr = vst_out@cell.attributes, bin_size = 256,
model_pars_nonreg = NULL,
verbosity = vst_out@arguments$verbosity,
verbose = NULL, show_progress = NULL)
{
if (!is.null(verbose)) {
warning("The 'verbose' argument is deprecated as of v0.3. Use 'verbosity' instead. (in sctransform::vst)",
immediate. = TRUE, call. = FALSE)
verbosity <- as.numeric(verbose)
}
if (!is.null(show_progress)) {
warning("The 'show_progress' argument is deprecated as of v0.3. Use 'verbosity' instead. (in sctransform::vst)",
immediate. = TRUE, call. = FALSE)
if (show_progress) {
verbosity <- 2
}
else {
verbosity <- min(verbosity, 1)
}
}
regressor_data <- model.matrix(as.formula(gsub("^y", "",
vst_out@model)), cell_attr)
model_pars <- vst_out@feature.attributes[,c("theta", "(Intercept)","log_umi")]
# vst_out$model_pars_nonreg :
# I don't recognize the location of this slot in the new obj, so it is NULL
if (!is.null(dim(model_pars_nonreg))) {
regressor_data_nonreg <- model.matrix(as.formula(gsub("^y",
"", vst_out$model_str_nonreg)), cell_attr)
regressor_data <- cbind(regressor_data, regressor_data_nonreg)
model_pars <- cbind(vst_out$model_pars_fit, vst_out$model_pars_nonreg)
}
genes <- rownames(umi)[rownames(umi) %in% rownames(model_pars)]
if (verbosity > 0) {
message("Calculating variance for residuals of type ",
residual_type, " for ", length(genes), " genes")
}
bin_ind <- ceiling(x = 1:length(x = genes)/bin_size)
max_bin <- max(bin_ind)
if (verbosity > 1) {
pb <- txtProgressBar(min = 0, max = max_bin, style = 3)
}
res <- matrix(NA_real_, length(genes))
names(res) <- genes
for (i in 1:max_bin) {
genes_bin <- genes[bin_ind == i]
mu <- exp(tcrossprod(as.matrix(model_pars[genes_bin, -1, drop = FALSE]), regressor_data))
y <- as.matrix(umi[genes_bin, , drop = FALSE])
res_mat <- switch(residual_type, pearson = pearson_residual(y,
mu, model_pars[genes_bin, "theta"], min_var = min_variance),
deviance = deviance_residual(y, mu, model_pars[genes_bin,
"theta"]), stop("residual_type ", residual_type,
" unknown - only pearson and deviance supported at the moment"))
res_mat[res_mat < res_clip_range[1]] <- res_clip_range[1]
res_mat[res_mat > res_clip_range[2]] <- res_clip_range[2]
res[genes_bin] <- row_var(res_mat)
if (verbosity > 1) {
setTxtProgressBar(pb, i)
}
}
if (verbosity > 1) {
close(pb)
}
return(res)
}
Hi Christoph!
I am using your Seurat/SCTransfrom to re-analyse some datasets. My main aim is to extract Residual Variances cluster-specific. To do so, I wrote a piece of code that uses get_residual_var() on a subset of cells (1 cluster) and this return residual variances cluster-specific.
Long story short: The slots in Seurat changed, and now I cannot run this anymore. Neither get_residual_var() nor get_residuals() works on Seurat Object 4.0.1.
I saw that a new function was implemented GetResidual(), that calls get_residuals(), which runs without errors, but It doesn't reproduce the results as before.
Q1. What are the main differences between get_residual_var() & get_residuals()? I had a look at the code, and I'm struggling to clearly identify those. Q2. Is there anything you can suggest to get around this issue and reproduce the analysis I was successfully running with the previous version of Seurat?
THANKS IN ADVANCE