timothy-barry / ondisc

Space- and time-optimal algorithms for large single-cell expression matrices, with a focus on single-cell CRISPR screens.
https://timothy-barry.github.io/ondisc/
Other
11 stars 5 forks source link

Rewrite get_mtx_metadata() #1

Closed yixuan closed 3 years ago

yixuan commented 3 years ago

This PR rewrites the get_mtx_metadata() function in a cleaner and more efficient way. It encapsulates the previous get_n_rows_with_comments_mtx() and get_mtx_metadata() functions into a single one, and only reads the data file once.

There is one more tiny issue. In the previous R code of get_mtx_metadata(), an error is given if n_data_points > .Machine$integer.max. However, the error message says "Numer of rows exceeds maximum value", so it needs to be clarified whether we put the limit on n_data_points, or n_features, or n_cells.

timothy-barry commented 3 years ago

Thanks for this!

We need a limit on n_data_points (at least for now -- this will be updated later).