timothy-barry / ondisc

Space- and time-optimal algorithms for large single-cell expression matrices, with a focus on single-cell CRISPR screens.
https://timothy-barry.github.io/ondisc/
Other
11 stars 5 forks source link

Investigate memory usage of `create_ondisc_matrix_from_mtx` #23

Closed timothy-barry closed 2 years ago

timothy-barry commented 2 years ago

The create_ondisc_matrix_from_mtx function appears to use a lot of memory when the chunked option is set to true; this likely is due to a bug in read_delim_chunked. Consider different approaches to implementing this functionality (e.g., Linux split) if the bug is not fixed.

timothy-barry commented 2 years ago

readr seems to be suffering from serious memory leak issues; use instead fread from data.table.

timothy-barry commented 2 years ago

Plan: first use split to split an mtx file into smaller chunks. Then, read each chunk via fread. Delete each file after it is read into memory.