jon-xu / scSplit

Genotype-free demultiplexing of pooled single-cell RNA-Seq, using a hidden state model for identifying genetically distinct samples within a mixed population.
MIT License
39 stars 9 forks source link

Memory error, which can be fixed #13

Closed VladimirShitov closed 3 years ago

VladimirShitov commented 3 years ago

There are 2 pandas DataFrames being created in build_base_call_matrix(). However, their type is not explicitly set. Pandas sets it to np.int64 by default, which causes a huge amount of memory to be allocated. For big matrices it causes MemoryError: Screenshot from 2021-01-16 20-53-34

I'm not sure, how big numbers in the matrices can be. But I believe, np.int16 would be more than enough. Please, set the type that matches the task.

jon-xu commented 3 years ago

Thanks for the suggestion! Corrected in the newest release.