welch-lab / liger

R package for integrating and analyzing multiple single-cell datasets
GNU General Public License v3.0
380 stars 78 forks source link

Problems with scaleNotCenter() #290

Closed wangzhenzZ closed 5 months ago

wangzhenzZ commented 11 months ago

Thank you for creating this useful package! I'm using rliger_1.0.0 in R 4.2.0. I'm trying to integrate multiple scRNA-seq datasets following the provided vignette.

library(rliger)

>Data_liger <- seuratToLiger(Data_list)
Removing 5473 genes not expressing in SeuratProject.
Removing 6641 genes not expressing in SeuratProject.
Removing 9995 genes not expressing in SeuratProject.
Removing 8917 genes not expressing in SeuratProject.
Removing 9347 genes not expressing in SeuratProject.
Removing 8938 genes not expressing in SeuratProject.
> Data_liger <- normalize(Data_liger)
> Data_liger <- selectGenes(Data_liger, var.thresh=0.4)
> length(Data_liger@var.genes)
[1] 2799
> Data_liger <- scaleNotCenter(Data_liger) 
Error: Index out of bounds: [index=2799; extent=2799]

The scaleNotCenter() took about 2 hours to run, and then I got the above error message. Could you please advise what I could do to troubleshoot and fix this issue? Any suggestions would be greatly appreciated.

wangzhenzZ commented 11 months ago

I figured out that I might be having trouble at this step: scaleNotCenterFast(t(object@norm.data[[i]][object@var.genes, ])), so I modified the code: scale(t(object@norm.data[[i]][object@var.genes, ]), center = FALSE, scale = TRUE). This allowed the function to run successfully. I have two questions:

  1. What specific operations does scaleNotCenterFast() perform? This will help me better understand if my modification is appropriate.
  2. Is directly using scale() a feasible workaround here? Or do you recommend any other solutions?
mvfki commented 10 months ago

The overall idea is that we divide each row of a dataset by the value: sqrt( sum(square of the row) / (ncol-1) ), and then use the transpose. And yes, scale(t(object@norm.data[[i]][genes, ]), center = FALSE, scale = TRUE) is exactly equivalent.

For the error, can you make sure that all the var.genes can be found in your norm.data? For example:

sapply(object@norm.data, function(nd) all(object@var.genes %in% rownames(nd)))