satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

get_residual_var() & get_residuals() do not work on the newest Seurat Object (Seurat_4.0.1) #107

Closed StefaniAstrologo closed 3 years ago

StefaniAstrologo commented 3 years ago

Hi Christoph!

I am using your Seurat/SCTransfrom to re-analyse some datasets. My main aim is to extract Residual Variances cluster-specific. To do so, I wrote a piece of code that uses get_residual_var() on a subset of cells (1 cluster) and this return residual variances cluster-specific.

SeuratObj <- readRDS('TabulaMuris_Bladder.rds')
SeuratObj <- SCTransform(SeuratObj, verbose = FALSE)
#--------------------------------------------
# filter for cells belonging to a specific cluster
cl <- '0'
cl_filter <- SeuratObj@meta.data$seurat_clusters == cl
#--------------------------------------------
# Raw counts

umi.cl<- SeuratObj@assays$RNA@counts[,cl_filter]

#--------------------------------------------
# Output of vst function
vst_out.cl           <- SeuratObj@assays$SCT@misc$vst.out
vst_out.cl$cell_attr <- vst_out.cl$cell_attr[cl_filter,]

#--------------------------------------------
getResVar.cl  <- sctransform::get_residuals(vst_out =vst_out.cl, umi = umi.cl, verbosity = 0)

Long story short: The slots in Seurat changed, and now I cannot run this anymore. Neither get_residual_var() nor get_residuals() works on Seurat Object 4.0.1.

I saw that a new function was implemented GetResidual(), that calls get_residuals(), which runs without errors, but It doesn't reproduce the results as before.

Q1. What are the main differences between get_residual_var() & get_residuals()? I had a look at the code, and I'm struggling to clearly identify those. Q2. Is there anything you can suggest to get around this issue and reproduce the analysis I was successfully running with the previous version of Seurat?

THANKS IN ADVANCE

ChristophH commented 3 years ago

sctransform::get_residuals() takes an sctransform-learned model and a count matrix and returns the residuals as a matrix with the same dimensions as the input count matrix. sctransform::get_residual_var() returns the variance of each row of that matrix without creating the full matrix.

Why can you not call these functions anymore with Seurat Object 4.0.1?

As to the Seurat::GetResidual() function. If I understand correctly, this can be used to update the scale.data to add features that might not be present (by default only variable features are in scale.data).

One important thing to keep in mind when calling sctransform functions directly is that the Seurat wrapper might use a different clipping range for the residuals. The Seurat::SCTransform default is clip.range = c(-sqrt(n_cells/30), sqrt(n_cells/30)) that would also be used by GetResidual(), while the sctransform::vst() default is wider at c(-sqrt(n_cells), sqrt(n_cells)) (here n_cells are the number of cells input when initially normalising the data).

StefaniAstrologo commented 3 years ago

Hi @ChristophH, sorry for the late reply. Below I tried to be a bit more specific, and I really hope you can help me :)

Why can you not call these functions anymore with Seurat Object 4.0.1?

I think because the structure of the Seurat Object changed.

# Old SeuratObj

> glimpse(so_old@assays$SCT)

Formal class 'Assay' [package "Seurat"] with 8 slots
  ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ scale.data   : num [1:3000, 1:4481] -0.2961 -0.0411 -0.2345 -0.1305 -0.2621 ...
  .. ..- attr(*, "dimnames")=List of 2
  ..@ key          : chr "sct_"
  ..@ var.features : chr [1:3000] "Cd74" "Dcn" "Gsn" "Fabp4" ...
  ..@ meta.features:'data.frame':   15175 obs. of  6 variables:
  .. ..$ sct.detection_rate   : num [1:15175] 0.0375 0.2647 0.1767 0.3963 0.0196 ...
  .. ..$ sct.gmean            : num [1:15175] 0.0478 0.2451 0.1459 0.4248 0.0157 ...
  .. ..$ sct.variance         : num [1:15175] 1.2979 0.5071 0.2587 1.0205 0.0371 ...
  .. ..$ sct.residual_mean    : num [1:15175] 0.1938 -0.0189 0.0168 -0.0766 -0.0334 ...
  .. ..$ sct.residual_variance: num [1:15175] 9.983 0.846 1.015 0.72 0.707 ...
  .. ..$ sct.variable         : logi [1:15175] TRUE FALSE FALSE FALSE FALSE FALSE ...
  ..@ misc         :List of 1
  .. ..$ vst.out:List of 12
  ..@ NA           : NULL

# The vst_out is in so_old@assays$SCT@misc$vst.out (See below)

> glimpse(so_old@assays$SCT@misc$vst.out)
List of 12
 $ model_str            : chr "y ~ log_umi"
 $ model_pars           : num [1:2000, 1:3] 2.879 0.324 2.364 11.712 6.797 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2000] "Malat1" "Amot" "Dlg1" "Tpt1" ...
  .. ..$ : chr [1:3] "theta" "(Intercept)" "log_umi"
 $ model_pars_outliers  : logi [1:2000] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ model_pars_fit       : num [1:15175, 1:3] 0.57 1.422 1.267 1.712 0.241 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:15175] "Sox17" "Mrpl15" "Lypla1" "Tcea1" ...
  .. ..$ : chr [1:3] "theta" "(Intercept)" "log_umi"
  ..- attr(*, "outliers")= logi [1:2000] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ model_str_nonreg     : chr ""
[...]

#------------------------------------------------------------------------------

# New SeuratObj

> glimpse(SeuratObj@assays$SCT)
Formal class 'SCTAssay' [package "Seurat"] with 9 slots
  ..@ SCTModel.list:List of 1
  .. ..$ model1:Formal class 'SCTModel' [package "Seurat"] with 6 slots
  ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ scale.data   : num [1:3000, 1:2498] -0.114 -0.695 -0.294 -0.128 -0.872 ...
  .. ..- attr(*, "dimnames")=List of 2
  ..@ key          : chr "sct_"
  ..@ assay.orig   : chr "RNA"
  ..@ var.features : chr [1:3000] "Cxcl14" "Car3" "Hspa1a" "Pi16" ...
  ..@ meta.features:'data.frame':   14748 obs. of  0 variables
  ..@ misc         : Named list()

# The vst_out is in SeuratObj@assays$SCT@SCTModel.list$model1 (See below)

> glimpse(SeuratObj@assays$SCT@SCTModel.list$model1)
Formal class 'SCTModel' [package "Seurat"] with 6 slots
  ..@ feature.attributes:'data.frame':  14748 obs. of  12 variables:
  .. ..$ detection_rate       : num [1:14748] 0.0152 0.3066 0.3795 0.5813 0.2886 ...
  .. ..$ gmean                : num [1:14748] 0.0178 0.2811 0.3854 0.7522 0.2723 ...
  .. ..$ variance             : num [1:14748] 0.255 0.48 0.729 1.592 0.543 ...
  .. ..$ residual_mean        : num [1:14748] 0.12926 0.02881 -0.00228 -0.02117 0.00658 ...
  .. ..$ residual_variance    : num [1:14748] 7.577 0.989 0.8 0.759 0.924 ...
  .. ..$ theta                : num [1:14748] 0.197 2.61 2.717 2.571 2.586 ...
  .. ..$ (Intercept)          : num [1:14748] -9.87 -10.02 -9.58 -8.39 -10.06 ...
  .. ..$ log_umi              : num [1:14748] 1.49 2.19 2.15 2.04 2.19 ...
  .. ..$ genes_log_gmean_step1: logi [1:14748] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. ..$ step1_theta          : num [1:14748] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ step1_(Intercept)    : num [1:14748] NA NA NA NA NA NA NA NA NA NA ...
  .. ..$ step1_log_umi        : num [1:14748] NA NA NA NA NA NA NA NA NA NA ...
  ..@ cell.attributes   :'data.frame':  2498 obs. of  3 variables:
  .. ..$ umi        : num [1:2498] 10291 17520 12944 19593 14594 ...
  .. ..$ log_umi    : num [1:2498] 4.01 4.24 4.11 4.29 4.16 ...
  .. ..$ cells_step1: logi [1:2498] TRUE TRUE TRUE TRUE TRUE TRUE ...
  ..@ clips             :List of 2
  .. ..$ vst: num [1:2] -50 50
  .. ..$ sct: num [1:2] -9.13 9.13
  ..@ umi.assay         : chr "RNA"
  ..@ model             : chr "y ~ log_umi"
[...]

If I use sctransform::get_residual_var() on the 'old SeuratObj':

so_old <- readRDS('Mammary_Gland_so.rds')
vst_out_Old <- so_old@assays$SCT@misc$vst.out
umi_Old <- so_old@assays$RNA@counts
Res_Old <- sctransform::get_residual_var(vst_out_Old,umi_Old, verbosity = 0)
Res_Old[1:10]
    Sox17    Mrpl15    Lypla1     Tcea1     Rgs20   Atp6v1h    Rb1cc1    Pcmtd1     Sntg1      Rrs1 
9.9832401 0.8459077 1.0145507 0.7202136 0.7066688 0.9650393 1.0909149 1.2164037 1.0011648 1.1864490 

Instead, if I do the same on the 'new SeuratObj':

SeuratObj <- readRDS('Mammary_Gland_so_new.rds')
vst_out_New <- SeuratObj@assays$SCT@SCTModel.list$model1
umi_New <- SeuratObj@assays$RNA@counts
sctransform::get_residual_var(vst_out_New,umi_New, verbosity = 0)
Error: $ operator not defined for this S4 class

link for the two objects >>>> https://we.tl/t-DNcIMzwq5s

One important thing to keep in mind when calling sctransform functions directly is that the Seurat wrapper might use a different clipping range for the residuals. The Seurat::SCTransform default is clip.range = c(-sqrt(n_cells/30), sqrt(n_cells/30)) that would also be used by GetResidual(), while the sctransform::vst() default is wider at c(-sqrt(n_cells), sqrt(n_cells)) (here n_cells are the number of cells input when initially normalising the data).

Thanks this is very useful to know!

StefaniAstrologo commented 3 years ago

Hi @ChristophH just to close this issue. This is how I have changed the function sctransform::get_residual_var() in order to use it with the new obj.

GetResidualVar <- function(vst_out, umi, residual_type = "pearson", 
                           res_clip_range = c(-sqrt(ncol(umi)), sqrt(ncol(umi))), 
                           min_variance = vst_out@arguments$min_variance, 
                           cell_attr = vst_out@cell.attributes, bin_size = 256,
                           model_pars_nonreg = NULL, 
                           verbosity = vst_out@arguments$verbosity, 
                           verbose = NULL, show_progress = NULL)

{
  if (!is.null(verbose)) {
    warning("The 'verbose' argument is deprecated as of v0.3. Use 'verbosity' instead. (in sctransform::vst)",
            immediate. = TRUE, call. = FALSE)
    verbosity <- as.numeric(verbose)
  }
  if (!is.null(show_progress)) {
    warning("The 'show_progress' argument is deprecated as of v0.3. Use 'verbosity' instead. (in sctransform::vst)",
            immediate. = TRUE, call. = FALSE)
    if (show_progress) {
      verbosity <- 2
    }
    else {
      verbosity <- min(verbosity, 1)
    }
  }
  regressor_data <- model.matrix(as.formula(gsub("^y", "",
                                                 vst_out@model)), cell_attr)
  model_pars <- vst_out@feature.attributes[,c("theta", "(Intercept)","log_umi")]
  # vst_out$model_pars_nonreg : 
  # I don't recognize the location of this slot in the new obj, so it is NULL
  if (!is.null(dim(model_pars_nonreg))) {    
    regressor_data_nonreg <- model.matrix(as.formula(gsub("^y",
                                                          "", vst_out$model_str_nonreg)), cell_attr)
    regressor_data <- cbind(regressor_data, regressor_data_nonreg)
    model_pars <- cbind(vst_out$model_pars_fit, vst_out$model_pars_nonreg)
  }

  genes <- rownames(umi)[rownames(umi) %in% rownames(model_pars)]

  if (verbosity > 0) {
    message("Calculating variance for residuals of type ",
            residual_type, " for ", length(genes), " genes")
  }
  bin_ind <- ceiling(x = 1:length(x = genes)/bin_size)
  max_bin <- max(bin_ind)
  if (verbosity > 1) {
    pb <- txtProgressBar(min = 0, max = max_bin, style = 3)
  }

  res <- matrix(NA_real_, length(genes))
  names(res) <- genes

  for (i in 1:max_bin) {
    genes_bin <- genes[bin_ind == i]
    mu <- exp(tcrossprod(as.matrix(model_pars[genes_bin, -1, drop = FALSE]), regressor_data))
    y <- as.matrix(umi[genes_bin, , drop = FALSE])
    res_mat <- switch(residual_type, pearson = pearson_residual(y,
                                                                mu, model_pars[genes_bin, "theta"], min_var = min_variance),
                      deviance = deviance_residual(y, mu, model_pars[genes_bin,
                                                                     "theta"]), stop("residual_type ", residual_type,
                                                                                     " unknown - only pearson and deviance supported at the moment"))
    res_mat[res_mat < res_clip_range[1]] <- res_clip_range[1]
    res_mat[res_mat > res_clip_range[2]] <- res_clip_range[2]
    res[genes_bin] <- row_var(res_mat)
    if (verbosity > 1) {
      setTxtProgressBar(pb, i)
    }
  }
  if (verbosity > 1) {
    close(pb)
  }
  return(res)
}