XRImage.palettize results in image with fill value inconsistent with dtype

gerritholl commented 4 weeks ago

Describe the bug

XRImage.palettize changes the dtype of the data to int64, but if the input data had a fill value defined, this is not changed. This may result in a fill value that is inconsistent with the dtype.

To Reproduce

import numpy as np
import xarray as xr
import trollimage.xrimage
import trollimage.colormap

arr = np.arange(25, dtype="float32").reshape(5, 5)
data = xr.DataArray(arr.copy(), dims=['y', 'x'], attrs={"_FillValue": np.nan})
img = trollimage.xrimage.XRImage(data)
img.palettize(trollimage.colormap.brbg)
print(img.data.dtype, img.data.attrs["_FillValue"])

Expected behavior

I expect that the fill value is of the same dtype as the data.

Actual results

int64 nan

Environment Info:

OS: [e.g. OSX, Windows, Linux]
trollimage version: 1.15.0

Additional context

This may pose a problem in Satpy when loading the cloud_top_height composite with the nwcsaf-geo reader and writing it out as a mode P image. Fill values in the input will be written with pixel value 255, but the fill_value argument in get_enhanced_image will have no influence and the fill value in the metadata of the produced image will not correspond to the actual fill value used in the input data.

djhoese commented 4 weeks ago

I think fill values don't mean anything after palettize. I agree something has to happen with _FillValue (maybe remove it), but doesn't palettize force all inputs to some color? The other option is a PA (versus a P) image where the Alpha channel holds the fill mask.

gerritholl commented 4 weeks ago

palettize forces all inputs to some color, including inputs for "missing data". That does not rule out defining in the resulting image that missing data have pixel value 0 or 255 or something else, like we do for single channel mode L images. A client can treat those pixels differently on top of the colour displayed. For example, NinJo will show a tooltip with the pixel value for all valid pixels, but not for pixels with a value equal to fill value.

gerritholl commented 4 weeks ago

Independently of that question, it would be desirable if the user could control what value nan gets after applying palettize. That they get value 255 is ultimately due to numpy.digitize putting nan in the final bin for being outside the range, but from a user's perspective, invalid pixels remain invalid pixels.

gerritholl commented 4 weeks ago

I'm not sure if nan is guaranteed to go into the final bin or if this is an accident of implementation. If it is not guaranteed, there is even more reason to try to control this better from trollimage.

gerritholl commented 4 weeks ago

I think nan is guaranteed to go to the final bin, because nan is sorted to the end. So we could optionally set the fill value of the palettize result to the value of the final bin.

gerritholl commented 4 weeks ago

I would propose to (optionally?) in XRImage.palettize set the fill value of the result to colormap.values.shape[0]-1, which is where nan (as well as inf and valid out-of-range values) goes. I think that in this case, satpy's save_dataset(..., fill_value=x) would result in those being written with value x to the output file.

djhoese commented 4 weeks ago

Correct me if I misunderstood, but you're saying that right now invalid values or at least NaNs specifically are already resulting in the maximum/last color in the colormap. The main issue is that _FillValue is no longer in sync with the data it is attached to (in .attrs). It seems reasonable to update the _FillValue as you say to point to the last index.

This gets weird with some cases like category products where 0 (or other value) are known to be invalid and represent invalid. In those cases I think the assumption is that the user or rather the person who created the enhancement configuration knew this and the appropriate color in the colormap for that value represents "invalid" to them. But in these existing cases I don't think there is any indication of what is invalid or valid as far as geotiff metadata.

gerritholl commented 3 weeks ago

Correct me if I misunderstood, but you're saying that right now invalid values or at least NaNs specifically are already resulting in the maximum/last color in the colormap.

Correct. This is guaranteed by nan being always sorted after any finite values.

The main issue is that _FillValue is no longer in sync with the data it is attached to (in .attrs). It seems reasonable to update the _FillValue as you say to point to the last index.

Yes.

Downside: the last index also represents valid values close to the maximum of the valid range. Those get mixed already now, but when we don't have any invalid values, the user might not want to define any valid fill value, in order not to lose any valid data.

pytroll / trollimage

XRImage.palettize results in image with fill value inconsistent with dtype #178