Make large datasets subsettable for UMAP

nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images

https://oncopathtk.org

Other

21 stars 2 forks source link

Make large datasets subsettable for UMAP #352

Closed jimmymathews closed 1 month ago

jimmymathews commented 2 months ago

The current behavior for generating UMAPs requires a somewhat limited dataset due to RAM strain on postgres. Fix this by doing random subselection more manually, then pulling only what is needed.

jimmymathews commented 1 month ago

The actual problem seems to be that the query used to pull out feature data for a random subselection requires too much temporary storage on the postgres server. A fix for this could be to randomly subselect integer indices, and pull them out manually from those binary-format feature matrix payloads (since we can query these quickly), once we add the option to include the continuous intensity values there.