pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.07k stars 295 forks source link

Post-computation result for resampled scene depends on generate=True or generate=False #2400

Open gerritholl opened 1 year ago

gerritholl commented 1 year ago

Describe the bug

With the CategoricalDataCompositor and the MaskingCompositor, the values for a resampled scene may be different depending on whether the composite is calculated before or after resampling. This might affect other compositors as well.

I'm not sure if it classifies as a bug, but I think it's an unexpected and undesired consequence of how Satpy works.

To Reproduce

import tempfile
import pathlib
from glob import glob
import os
config = """sensor/name: visir/seviri

composites:
  testmask:
    compositor: !!python/name:satpy.composites.CategoricalDataCompositor
    prerequisites:
      - ct
    standard_name: testmask
    lut: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
"""
with tempfile.TemporaryDirectory() as td:
    p = pathlib.Path(td)
    fn = p / "composites" / "seviri.yaml"
    fn.parent.mkdir(exist_ok=True, parents=True)
    with fn.open(mode="wt", encoding="ascii") as fp:
        fp.write(config)
    os.environ["SATPY_CONFIG_PATH"] = td

    from satpy.composites import CategoricalDataCompositor
    from satpy import Scene
    files_seviri_l1 = glob("/media/nas/x21308/scratch/SEVIRI/20230221143000/H-000-MSG4__-MSG4________-*202302211430*")
    files_nwcsaf_ct = ["/media/nas/x21308/scratch/NWCSAF/20230221143000/S_NWC_CT_MSG4_MSG-N-VISIR_20230221T143000Z.nc"]
    sc = Scene(filenames={"nwcsaf-geo": files_nwcsaf_ct})
    sc.load(["ct", "testmask"], generate=True)
    ls = sc.resample("maspalomas", radius_of_influence=25000, fill_value=255)
    print(ls["testmask"][1099, 2099].compute().item())

Expected behavior

I expect identical results between generate=True and generate=False.

Actual results

With 'generate=True, the result is 255. Withgenerate=False`, the result is 0.

Environment Info:

Additional context

This happens because the values received by the compositor differ between the generate=True and generate=False cases. If generate=True, the compositor datasets from the unresampled scene, which does not contain fill values added by the resampler. If generate=False, or if the compositor relies on datasets with different incompatible areas, it will receive datasets from the resampled scene, including fill values added by the resampler. In the first case, the resampled scene contains fill values added by the resampler. In the second case, those fill values have been replaced by the CategoricalDataResampler. A similar phenomenon happens with the MaskingCompositor.

I think it should affect other compositors as well, but it may be less clear there because most compositors do not generate entirely new values.

I have no idea how this could be resolved, in particular not without breaking stuff or significant redesign of Satpy.

djhoese commented 1 year ago

I'll mention it here for the record, but I think any workflow that depends on fill_value being passed to Scene.resample is either a workaround for a larger issue (ex. _FillValue missing from an integer product) or the user knows exactly what they want for some very specific reason (ex. "I want all fill values to be 'grey' in the final image").

gerritholl commented 1 year ago

Maybe CategoricalDataCompositor and MaskingCompositor need to behave differently with regard to fill values. I'm not sure how that could be done without breaking compatibility, though. In the case I described here, fill_value=255 is actually redundant, because the _FillValue in the input data is also 255. The CategoricalDataCompositor then replaces the fill value with other values, but only if it is called after resampling.

How can we support users of the MaskingCompositor defining whether fill values should be masked or not, without ensuring that the MaskingCompositor is called after resampling (thus generate=False)? I'm trying to get something binary, 1/0, but the fill value is 255.