reillytilbury / coppafish

Python version of iss software
MIT License
4 stars 2 forks source link

Save OMP results using zarr #312

Closed paulshuker closed 3 months ago

paulshuker commented 3 months ago

The Issue:

Currently, OMP appends all tile results together in large, numpy arrays. These become unwieldy because they take up a lot of memory to keep. This can cause a memory crash on low RAM PCs running large datasets as well as slow the pipeline down.

Suggestion:

Add support for zarr groups being saved to notebook pages.

Then, store OMP results in a zarr group which can be added to the notebook page as a single variable called results. Within the group, there will be subgroups for each tile, e.g. tile_0, tile_1, .... Within each subgroup, there will be a zarray called colours, scores, local_yxz, and gene_no to hold all the information required. The data can be grabbed easily. For example, to gather tile 0's local_yxz you write nb.omp.results['tile_0/local_yxz'][:].

This will mean we do not have to keep all results in memory and have added support for compressing OMP results down to save disk space. It also means the notebook does not need to load all OMP results into memory straight away, it will only load into memory when pulled by the code explicitly.