zarr-developers / numcodecs

A Python package providing buffer compression and transformation codecs for use in data storage and communication applications.
http://numcodecs.readthedocs.io
MIT License
121 stars 82 forks source link

FixedScaleOffset for handling NaN inputs #511

Open mullenkamp opened 4 months ago

mullenkamp commented 4 months ago

Hi,

I was wondering if you all would be interested in a FixedScaleOffset that can handle np.nan inputs? In the style of HDF/netcdf, having a fill value to replace np.nan with an appropriate integer. This could be either user defined or determined automatically based on the astype integer size (assign it the smallest possible integer value). I can either modify the existing FixedScaleOffset class, or I could create another class. It's a very simple change, though there may be concerns of more memory usage due to boolean masking.

Also, is there any reason why dtype shouldn't always be a float and astype shouldn't always be an integer?

Thanks

martindurant commented 4 months ago

is there any reason why dtype shouldn't always be a float and astype shouldn't always be an integer

I can certainly imagine an offset for integers, but a scale less so. Of course, you also get to specify the bitsize of each, and it can be important whether you save as uint8 and load into float32 or something bigger.