single-cell-data / TileDB-SOMA

Python and R SOMA APIs using TileDB’s cloud-native format. Ideal for single-cell data at any scale.
https://tiledbsoma.readthedocs.io
MIT License
84 stars 25 forks source link

[r] Out-of-memory issues on `cellxgene_census` R acceptance tests #2463

Open johnkerl opened 4 months ago

johnkerl commented 4 months ago

This may, of course, be a bug within cellxgene_census. However, the Python and R APIs should be working similarly enough that if one API is not OOMing and another is OOMing, on the same dataset and query pattern, there is likely either a TileDB-SOMA code flaw or a TileDB-SOMA documentation flaw.

Reported by @ebezzi.

johnkerl commented 3 months ago

@ebezzi were these resolved by adding rm and gc calls to your R test cases?

ebezzi commented 3 months ago

Yeah, that specific test was fixed via rm+gc. There is still a test that sometimes fails because of OOM (test_seurat_common-cell-type-large-buffer-size), and it's not a garbage collection issue since this test loads everything in memory. We'll monitor and see how frequent the failures are, and worst case we'll lower the limit of queried cells (right now it's 15M).