tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis
MIT License
86 stars 8 forks source link

to_proper_matrix doesn't transform the layout of compressed matrices #57

Closed ukleiner closed 1 year ago

ukleiner commented 1 year ago

In mc.ut.to_proper_matrix a scipy compressed matrix (csr_matrix, csc_matrix) won't change it's layout even if it is in the wrong sparse layout (column to row or row to column). This is because the compress matrix check happens before the sparse matrix check (and transformation). The returned matrix will be the original compressed matrix and running will stop if mc.ut.allow_inefficient_layout(False) is set

Surfaced while running mc.pl.relate_to_lateral_genes with a CSC matrix. I believe checking for sparsity before compression will fix the issue.

My fix for know is to manually relayout the count matrix before calling mc.p.relate_to_lateral_genes full.X = full.X.astype(dtype='float32').tocsr()

Thanks!

orenbenkiki commented 1 year ago

"Wrong layout" is still a "proper" matrix - use to_layout if you want to ensure a specific layout.

The purpose of "to_proper" is to deal with types such as coo_matrix and other weird stuff that doesn't even have a major axis - that is, all the near-infinite set of arbitrary strange matrix layouts that metacells can't deal with.

The result of to_proper_matrix is a matrix in one of the small "reasonable" set of formats that metacells can deal with (dense, sparse in row or column major order).

This doesn't absolve one from worrying about the memory layout (column vs. row major layout).

Alas, given computers hardware works the way it works (and "physics"), working "against the grain" of the data (e.g. summing columns on a row-major matrix) is way slower, and it is in general more efficient to first relayout the data in the proper order before operating on it, so the code is intentionally fussy about that.

You can use allow_inefficient_layout if you want to disable these assertions - this is highly not recommended for data of a non-trivial size.