mobie / mobie-utils-python

Python tools for MoBIE
MIT License
9 stars 5 forks source link

Compression/conversion of floats (for OME-Zarr) #116

Open tischi opened 8 months ago

tischi commented 8 months ago

@constantinpape

  1. which compression are you using for OME-Zarr ?
  2. could it be that this does not work well for floating point data?

Could we add an option to the addImage for a conversion to uint8 ? I guess for this one would need an array conversion_min_max[ 2 ], which would then be used for linear conversion:

min = conversion_min_max[ 0 ]
max = conversion_min_max[ 1 ]
value_unit8 = 255 * ( value - min ) / ( max - mix )

I am asking, because I am dealing with a floating point dataset and I have a feeling that it is much slower to load than from a unit8 dataset that is of comparable size, chunking and dimensions. Of course they are not the same, so I am not sure.

Maybe I could first try to "manually" convert the float to unit8 and see if that indeed helps.

tischi commented 8 months ago

I looked a bit into the data:

3255557 bytes in xray.ome.zarr/s0/16/16/16
774407 bytes in em.ome.zarr/s0/10/20/30

em is unit8 ( 1 byte ).

the x-ray chunk here is ~4.5 times larger in terms of bytes than the em chunk; which is roughly explained by the fact that it is float ( 4 bytes ).

chunk sizes are 96^3

774407 / 96^3 = 0.875 => compression of the uint8 em is not amazing (1.0 would be no compression), is it?

I have no experience, is it normal that EM data does not compress well?

constantinpape commented 8 months ago

It should use the default compression for zarr-python, which I think is blosc + lz4. You can check on the details in the .zarray file.

In general: how well compression works depends on the distribution of intensity values in the data. EM has a fairly even distribution between min and max (typically [0, 255]). So yes, it is expected that it does not compress well.