ome / omero-cli-zarr

https://pypi.org/project/omero-cli-zarr/
GNU General Public License v2.0
15 stars 10 forks source link

labels exported as dtype i8 #115

Closed will-moore closed 2 years ago

will-moore commented 2 years ago

Labels are exported with zarr arrays of dypte i8

https://github.com/ome/omero-cli-zarr/blob/0d0ff5d9533d3aad2909961a6043e6f172eda023/src/omero_zarr/masks.py#L477

And it's the default option at https://github.com/ome/omero-cli-zarr/blob/0d0ff5d9533d3aad2909961a6043e6f172eda023/src/omero_zarr/cli.py#L140

If you try to view these in the web e.g. vizarr https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001247.zarr/labels/0 or https://ome-ngff-validator.netlify.app/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001247.zarr/labels/0 this fails due to zarr.js not supporting i8 (and it doesn't look like it's going to anytime soon): https://github.com/gzuidhof/zarr.js/issues/85

To quote @manzt "If you have control over the data you are generating, I would recommend thinking about whether you really require the entire range of integers that 64-bit integer offers. If you can avoid using them, you will have an easier time in your web-application and the size of data transferred to your application will be much smaller".

sbesson commented 2 years ago

Looking at the implementation, I suspect what motivated the usage of int64 is that the label values are using the ROI ID so int64 is needed to support the same range as long

https://github.com/ome/omero-cli-zarr/blob/0d0ff5d9533d3aad2909961a6043e6f172eda023/src/omero_zarr/masks.py#L496 https://github.com/ome/omero-cli-zarr/blob/0d0ff5d9533d3aad2909961a6043e6f172eda023/src/omero_zarr/masks.py#L524-L525

An alternative would be to generate an internal label index from 0 to len(masks) - 1. In this case, the dynamic range would be limited by the number of masks and np.int8 or np.int16 would likely cover many use cases