jimmymathews commented 7 months ago

When the cell-data API endpoint is queried for a sample with a large number of cells (e.g. 700k), the query seems to be forwarded correctly to the cells service, which then correctly finds a cached version (if it exists), but the API service does not seem to be receiving the payloads from the cells service. Possibly there are cluster-internal limitations that stall such transfers.

For this issue, diagnose exactly how the stall happens and try to fix it.

jimmymathews commented 7 months ago

A quick fix was implemented in PR #315, but this is meant to be just temporary.

jimmymathews commented 5 months ago

This issue has now been thoroughly diagnosed. A main problem is the division of labor between the API and cells-data containerized services -- the API server is asking for a few hundred MB for each JSON payload from the cells-data service, but this service is a custom TCP server and the TCP client is reading one byte at a time since there is no advertisement of total byte count. The throughput in this system is about 1-2 MB per second, very slow.

A secondary problem, related to the first, is that during the period of intra-cluster transfer the API server is non-responsive I guess due to the way the TCP client is set up.

To make matters worse, the ingress controller's timeout seemed to be triggering repeated retries immediately after the query completed, so the already-non-responsive API service was usually re-doing the exact same query 3 times immediately after completion. (And the completed query never is returned to the client).

All this is not noticeable when the payloads are small (a few MB).

Plan

In discussion with @franciscouzo we decided, independently of dealing with this issue, that we should stop creating JSON payloads of the cell data because they are needlessly wasteful of memory. We will transfer efficient binary data structures instead (closer to what is actually already stored in Postgres), and have the application/client interpret them byte by byte.

This would change the job of the cells data service a lot, probably allowing it to skip the initialization step of pre-loading dataframes in memory at all, reducing its baseline RAM footprint to a couple of GB or less. Or, possibly, allowing the API service to handle this on its own with no additional service.

So a new API endpoint will be added, to replace the current cells-data one, that will send the efficient data structure, most likely structured as:

4 bytes 1  - 4 : number of cells (can be used to compute total byte count if necessary)
4 bytes 5  - 8: 32-bit integer x position
4 bytes 9  - 12: 32-bit integer y position
8 bytes 13 - 20: phenotype membership bits (corresponds to 64-bit integer representation elsewhere)
... repetition of 4+4+8=16 bytes, per cell

A separate mini endpoint can provide the ordered set of feature names.

This should give us a range of about 1MB to 75MB (for slide size range from ~100k to 5 million cells).

nadeemlab / SPT

Stalled backend for large cell data payloads #314

Plan