scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.93k stars 604 forks source link

A memory efficient implementation of the .mtx reading function #3389

Open gjeuken opened 3 days ago

gjeuken commented 3 days ago

Pandas read_csv function is very memory intensive, and this makes loading data (especially large datasets from EBI Single Cell Expression Atlas) impossible on computers with 16gb of ram or less. The subsequent analysis of such datasets with scanpy, however, works well on such computers.

Loading the data into chunks, using the same pandas function, solves this problem.

codecov[bot] commented 3 days ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 76.50%. Comparing base (7131500) to head (fa91b73).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3389 +/- ## ======================================= Coverage 76.50% 76.50% ======================================= Files 111 111 Lines 12874 12877 +3 ======================================= + Hits 9849 9852 +3 Misses 3025 3025 ``` | [Files with missing lines](https://app.codecov.io/gh/scverse/scanpy/pull/3389?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse) | Coverage Δ | | |---|---|---| | [src/scanpy/datasets/\_ebi\_expression\_atlas.py](https://app.codecov.io/gh/scverse/scanpy/pull/3389?src=pr&el=tree&filepath=src%2Fscanpy%2Fdatasets%2F_ebi_expression_atlas.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse#diff-c3JjL3NjYW5weS9kYXRhc2V0cy9fZWJpX2V4cHJlc3Npb25fYXRsYXMucHk=) | `94.18% <100.00%> (+0.21%)` | :arrow_up: |