hemberg-lab / SC3

A tool for the unsupervised clustering of cells from single cell RNA-Seq experiments
http://bioconductor.org/packages/SC3
GNU General Public License v3.0
119 stars 55 forks source link

Performance Enhancements, Laplacian matrix improvements #56

Closed pati-ni closed 6 years ago

pati-ni commented 6 years ago

This PR is an attempt to improve SC3 performance at scale. It contains:

  1. changes in the eigenvalue decomposition and singular value decomposition. As a full matrix decomposition is not required, functions has been replaced with low rank matrix approximations that shaves a l lot of wasted runtime.
  2. replace distance matrix metrics calculations and trivial transformations with optimized libraries. This also add the possibility of leveraging more CPU cores in server environments.
  3. replace of the laplacian matrix calculation with the igraph package. This may significantly improve results quality. Existing implementation may cause decomposition instability due to numerical precision errors. The igraph implementation seems to do a much better job calculating the laplacian matrix. It leaves less residues (precision errors) than the current one and it seems that this is enough to make the matrix during eigenvalue decomposition procedure more stable and significantly more robust.

Overall the package has more dependencies now which involve more low level libraries of well tested performance optimized linear algebra libraries. This may complicate the installation? I don't think so, but I am not sure either.

@wikiselev