pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.06k stars 292 forks source link

Cannot save mode LA image storing scale and offset tags #1851

Open gerritholl opened 2 years ago

gerritholl commented 2 years ago

Describe the bug

Saving a mode LA image while storing scale and offset tags fails with ValueError.

To Reproduce

"""Test saving LA image with scale/offset tags."""

import xarray as xr
import numpy as np
from satpy import Scene
from satpy.utils import debug_on; debug_on()

arr = xr.DataArray(
    np.zeros((20, 10, 2), dtype="uint8"),
    dims=("y", "x", "bands"),
    coords={"bands": ["L", "A"]})
sc = Scene()
sc["foo"] = arr
sc.save_dataset("foo", filename="/tmp/test.tif", include_scale_offset=True, writer="geotiff")
#sc.save_dataset("foo", filename="/tmp/test.tif", scale_offset_tags=("scale", "offset"), writer="geotiff")

(NB: the two versions of save_dataset are to accommodate for https://github.com/pytroll/trollimage/pull/85)

Expected behavior

I expect an LA image to be written with scale and offset written in the metadata. @mraspaud has reported that this works (source: personal communication on pytroll slack).

Actual results

[DEBUG: 2021-10-12 14:09:38 : satpy.writers] Reading ['/data/gholl/checkouts/satpy/satpy/etc/writers/geotiff.yaml']
[DEBUG: 2021-10-12 14:09:38 : trollimage.xrimage] Convert image data to dask array
[DEBUG: 2021-10-12 14:09:38 : satpy.writers] Enhancement configuration options: [{'name': 'stretch', 'method': <function stretch at 0x7fd380fb0a60>, 'kwargs': {'stretch': 'linear'}}]
[DEBUG: 2021-10-12 14:09:38 : trollimage.xrimage] Applying stretch linear with parameters {}
[DEBUG: 2021-10-12 14:09:38 : trollimage.xrimage] Perform a linear contrast stretch.
[DEBUG: 2021-10-12 14:09:38 : trollimage.xrimage] Calculate the histogram quantiles: 
[DEBUG: 2021-10-12 14:09:38 : trollimage.xrimage] Left and right quantiles: 0.005 0.005
/data/gholl/checkouts/trollimage/trollimage/xrimage.py:485: DeprecationWarning: include_scale_offset_tags is deprecated, please use scale_offset_tags to indicate tag labels
  warnings.warn(
[INFO: 2021-10-12 14:09:38 : trollimage.xrimage] Couldn't create geotransform
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] Entering env context: <rasterio.env.Env object at 0x7fd380e58a60>
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] Starting outermost env
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] No GDAL environment exists
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] New GDAL environment <rasterio._env.GDALEnv object at 0x7fd380e883d0> created
[DEBUG: 2021-10-12 14:09:38 : rasterio._env] GDAL_DATA found in environment.
[DEBUG: 2021-10-12 14:09:38 : rasterio._env] PROJ_LIB found in environment.
[DEBUG: 2021-10-12 14:09:38 : rasterio._env] Started GDALEnv: self=<rasterio._env.GDALEnv object at 0x7fd380e883d0>.
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] Entered env context: <rasterio.env.Env object at 0x7fd380e58a60>
[DEBUG: 2021-10-12 14:09:38 : rasterio._io] Path: UnparsedPath(path='/tmp/test.tif'), mode: w, driver: GTiff
[DEBUG: 2021-10-12 14:09:38 : rasterio._io] Option: ('COMPRESS', b'DEFLATE')
[DEBUG: 2021-10-12 14:09:38 : rasterio._io] Option: ('ZLEVEL', b'6')
/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/rasterio/__init__.py:230: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix be returned.
  s = writer(path, mode, driver=driver,
[DEBUG: 2021-10-12 14:09:38 : rasterio._base] Nodata success: 0, Nodata value: -10000000000.000000
[DEBUG: 2021-10-12 14:09:38 : rasterio._base] Nodata success: 0, Nodata value: -10000000000.000000
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] Exiting env context: <rasterio.env.Env object at 0x7fd380e58a60>
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] Cleared existing <rasterio._env.GDALEnv object at 0x7fd380e883d0> options
[DEBUG: 2021-10-12 14:09:38 : rasterio._env] Stopped GDALEnv <rasterio._env.GDALEnv object at 0x7fd380e883d0>.
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] Exiting outermost env
[DEBUG: 2021-10-12 14:09:38 : rasterio.env] Exited env context: <rasterio.env.Env object at 0x7fd380e58a60>
[DEBUG: 2021-10-12 14:09:39 : trollimage.xrimage] Interval: left=<xarray.DataArray (bands: 2)>
array([0., 0.])
Coordinates:
    quantile  float64 0.005
Dimensions without coordinates: bands, right=<xarray.DataArray (bands: 2)>
array([0., 0.])
Coordinates:
    quantile  float64 0.995
Dimensions without coordinates: bands
/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/core.py:121: RuntimeWarning: divide by zero encountered in true_divide
  return func(*(_execute_task(a, cache) for a in args))
/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/core.py:121: RuntimeWarning: invalid value encountered in multiply
  return func(*(_execute_task(a, cache) for a in args))
Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/save-la-tags.py", line 14, in <module>
    sc.save_dataset("foo", filename="/tmp/test.tif", include_scale_offset=True, writer="geotiff")
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1037, in save_dataset
    return writer.save_dataset(self[dataset_id],
  File "/data/gholl/checkouts/satpy/satpy/writers/__init__.py", line 809, in save_dataset
    return self.save_image(img, filename=filename, compute=compute, fill_value=fill_value, **kwargs)
  File "/data/gholl/checkouts/satpy/satpy/writers/geotiff.py", line 222, in save_image
    return img.save(filename, fformat='tif', fill_value=fill_value,
  File "/data/gholl/checkouts/trollimage/trollimage/xrimage.py", line 419, in save
    return self.rio_save(filename, fformat=fformat,
  File "/data/gholl/checkouts/trollimage/trollimage/xrimage.py", line 590, in rio_save
    res = da.store(*to_store)
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/array/core.py", line 1043, in store
    compute_as_if_collection(Array, store_dsk, store_keys, **kwargs)
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/base.py", line 315, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/local.py", line 517, in get_async
    raise_exception(exc, tb)
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/local.py", line 325, in reraise
    raise exc
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/local.py", line 223, in execute_task
    result = _execute_task(task, data)
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/array/core.py", line 4021, in store_chunk
    return load_store_chunk(x, out, index, lock, return_stored, False)
  File "/data/gholl/mambaforge/envs/py39/lib/python3.9/site-packages/dask/array/core.py", line 4008, in load_store_chunk
    out[index] = x
  File "/data/gholl/checkouts/trollimage/trollimage/xrimage.py", line 162, in __setitem__
    kwargs = {self.name: item.item()}
ValueError: can only convert an array of size 1 to a Python scalar

Environment Info:

Additional context

I wouldn't actually expect it to work. The scale and offset are only normally meaningful for the L-band, but the tags are global. The writer encounters two scales and offsets, for the L and the A band, tries to write those two values as a scalar, and fails. I'm curious to see why it works in Martins case.

gerritholl commented 2 years ago

This MCVE works as expected (no traceback and tags are written to GDALMetaData):

import logging
from satpy import Scene
#from satpy.utils import debug_on; debug_on()
from sattools.io import plotdir

fn = ["/media/nas/x21308/scratch/AAPP-processed/noaa18_20210122_0952_80791/hrpt_noaa18_20210122_0952_80791.l1b"]

sc = Scene(filenames=fn, reader=["avhrr_l1b_aapp"])
sc.load(["4"])
ls = sc.resample("eurol")
ls.save_datasets(filename=str(plotdir() /
    "{start_time:%Y%m%d%H%M}-{platform_name}_{sensor}_{area.area_id}_{name}.tif"),
    include_scale_offset=True)
gerritholl commented 2 years ago

Difference: my synthetic test is passing a dataset with mode LA to stretch, whereas in the real data case, the image still has mode L at that point, and the alpha band only gets added in XRImage.finalize. Apparently my synthetic data test is not realistic?

gerritholl commented 2 years ago

This might not occur any current datasets, but as I understand it will affect any case where a reader or compositor returns an image in mode LA to which stretching should subsequently be applied.

djhoese commented 2 years ago

I'm having trouble understanding. Is this failing to write the alpha band or is it failing to write the tags?

gerritholl commented 2 years ago

It's failing to write the tags. It's also applying scaling to the alpha band, which is most likely not what the user wants.

gerritholl commented 2 years ago

First, XRImage.stretch is applying the stretch enhancements. It is given a mode LA input and applies scaling and offset to both layers. XRImage.stretch dutifully documents the scale factors and offsets per band in a size-2 ndarray. Later, XRImage.rio_save finds this documentation when collecting the tags. It expects a size-1 ndarray for offset and scale factor and converts this with .item(), but fails when the offset and scale-factor are size-2.

The original failure here is in XRImage.stretch, which should identify a mode LA image and apply scaling only to L; then it will also document only a scalar for offset and scale factor.

gerritholl commented 2 years ago

Related: in https://github.com/pytroll/trollimage/issues/84 I noted that GDAL supports per-band scale factors and offsets, which would be an alternate way to store this information in a GeoTIFF file (but I don't know what client software supports this; at least NinJo doesn't).

djhoese commented 2 years ago

So if the problem is multiple factors and offsets then do RGB/A images also fail?

gerritholl commented 2 years ago

Yes, RGB/A images also fail. However, I think the problem is more pressing for LA images, because, as for L images, it's reasonable to map the pixel values to a physical interpretation. I have a hard time imagining that people would want to map the pixel values of an RGB/A image to a physical interpretation, however; and if they don't, then I don't know why they would need a scale factor and offset interpretation.

djhoese commented 2 years ago

ok so the two things (as far as I can tell from this discussion) that need to be fixed are:

  1. Don't let stretch be applied to Alpha unless the stretch parameters have multiple values (min_stretch=[0, 0] for an LA dataset).
  2. Produce a warning when including factor/offset tags if they have multiple values and say that the tags can't be saved. Optionally switch to per-band factor/offset and also produce a warning, but this may be a bit too magical.