pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.04k stars 287 forks source link

Creating composites using static background require double resampling when `generate=False` is used #2733

Open pnuu opened 5 months ago

pnuu commented 5 months ago

Describe the bug I'm trying to create the geo_color composite using FCI data and the built-in recipe. For some reason the Scene needs to be resampled twice for it to be saveable.

EDIT: The key to the problem is using generate=False when composite(s) using data with different projections are loaded. Such as static background image with any satellite data.

To Reproduce

import glob

import hdf5plugin
from satpy import Scene

FNAMES = glob.glob("/home/lahtinep/data/satellite/geo/fci/*C_0088*nc")

def main():
    """Run."""
    comps = ["geo_color"]
    glbl = Scene(reader='fci_l1c_nc', filenames=FNAMES)
    glbl.load(comps, generate=False)
    lcl = glbl.resample("euro4", resampler="gradient_search", reduce_data=False)
    # lcl2 = lcl.resample("euro4", resampler="gradient_search", reduce_data=False)
    # lcl2.save_datasets(
    lcl.save_datasets(
        filename='/tmp/{start_time:%Y%m%d_%H%M}_MTG-I1_euro4_{name}.tif',
        tiled=True, blockxsize=512, blockysize=512,
        driver='COG', overviews=[])

if __name__ == "__main__":
    main()

Expected behavior The image should be saved after a single resampling.

Actual results

RuntimeError: None of the requested datasets have been generated or could not be loaded. Requested composite inputs may need to have matching dimensions (eg. through resampling).

Environment Info:

Additional context If the lcl2 lines are taken to use, and thus the scene is resampled twice, the product is generated correctly.

Same problem is seen if I add night_ir_with_background composite (with sub-composites) to fci.yaml.

pnuu commented 5 months ago

It seems that setting generate=True makes the composite to work. This also means a significant performance hit when producing multiple composites sharing channels (instead of single resampling per channel, all RGBs are resampled separately).

pnuu commented 5 months ago

This doesn't seem to help: lcl = glbl.resample("euro4", resampler="gradient_search", reduce_data=False, generate=True)

pnuu commented 5 months ago

Neither does lcl.generate_possible_composites(True) before saving, which outputs The following datasets were not created and may require resampling to be generated: DataID(name='geo_color').

djhoese commented 5 months ago

Just for record keeping, can you print out glbl.keys() when generate=False? Then can you print out .attrs["area"] for everything in lcl.values() and compare? Or maybe do set([x.attrs["area"] for x in lcl.values()]) and print that out.

pnuu commented 5 months ago

For the glbl Scene:

ipdb> for k in glbl.keys(): print(k)
DataID(name='ir_105', wavelength=WavelengthRange(min=9.8, central=10.5, max=11.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='ir_38', wavelength=WavelengthRange(min=3.4, central=3.8, max=4.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='vis_04', wavelength=WavelengthRange(min=0.384, central=0.444, max=0.504, unit='µm'), resolution=1000, calibration=<1>, modifiers=())
DataID(name='vis_05', wavelength=WavelengthRange(min=0.47, central=0.51, max=0.55, unit='µm'), resolution=1000, calibration=<1>, modifiers=())
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<1>, modifiers=())
DataID(name='vis_08', wavelength=WavelengthRange(min=0.815, central=0.865, max=0.915, unit='µm'), resolution=1000, calibration=<1>, modifiers=())

ipdb> for v in set([x.attrs["area"] for x in glbl.values()]): print(v); print("")
Area ID: mtg_fci_fdss_1km
Description: MTG FCI Full Disk Scanning Service area definition with 1 km resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 11136
Number of rows: 11136
Area extent: (-5567999.9986, 5567999.9986, 5567999.9986, -5567999.9986)

Area ID: mtg_fci_fdss_500m
Description: MTG FCI Full Disk Scanning Service area definition with 500 m resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 22272
Number of rows: 22272
Area extent: (-5567999.9996, 5567999.9996, 5567999.9996, -5567999.9996)

and for lcl Scene:

ipdb> for k in lcl.keys(): print(k)
DataID(name='_geo_color_low_clouds_dep_0', resolution=1000)
DataID(name='_geo_color_low_clouds_dep_2')
DataID(name='_night_background_hires')
DataID(name='geo_color_high_clouds', resolution=1000)
DataID(name='ir_105', wavelength=WavelengthRange(min=9.8, central=10.5, max=11.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='true_color', resolution=500)
ipdb> for v in set([x.attrs["area"] for x in lcl.values()]): print(v); print("")

Area ID: None
Description: None
Projection ID: WGS 84
Projection: {'datum': 'WGS84', 'no_defs': 'None', 'proj': 'longlat', 'type': 'crs'}
Number of columns: 13500
Number of rows: 6750
Area extent: (-180.0, -90.0, 180.0, 90.0)

Area ID: euro4
Description: Euro 4km area - Europe
Projection: {'ellps': 'bessel', 'lat_0': '90', 'lat_ts': '60', 'lon_0': '14', 'no_defs': 'None', 'proj': 'stere', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 1024
Number of rows: 1024
Area extent: (-2717181.7305, -5571048.1403, 1378818.2695, -1475048.1403)

So the static background isn't loaded with generate=False.

For generate=True it is (via _night_background_hires sub-composite):

ipdb> for k in glbl.keys(): print(k)
DataID(name='_geo_color_low_clouds_dep_0', resolution=1000)
DataID(name='_geo_color_low_clouds_dep_2')
DataID(name='_night_background_hires')
DataID(name='geo_color_high_clouds', resolution=1000)
DataID(name='ir_105', wavelength=WavelengthRange(min=9.8, central=10.5, max=11.2, unit='µm'), resolution=1000, calibration=<2>, modifiers=())
DataID(name='vis_04', wavelength=WavelengthRange(min=0.384, central=0.444, max=0.504, unit='µm'), resolution=1000, calibration=<1>, modifiers=('sunz_corrected',))
DataID(name='vis_05', wavelength=WavelengthRange(min=0.47, central=0.51, max=0.55, unit='µm'), resolution=1000, calibration=<1>, modifiers=('sunz_corrected',))
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<1>, modifiers=('sunz_corrected',))
DataID(name='vis_06', wavelength=WavelengthRange(min=0.59, central=0.64, max=0.69, unit='µm'), resolution=500, calibration=<1>, modifiers=('sunz_corrected', 'rayleigh_corrected'))
DataID(name='vis_08', wavelength=WavelengthRange(min=0.815, central=0.865, max=0.915, unit='µm'), resolution=1000, calibration=<1>, modifiers=('sunz_corrected',))

ipdb> for v in set([x.attrs["area"] for x in glbl.values()]): print(v); print("")
Area ID: mtg_fci_fdss_1km
Description: MTG FCI Full Disk Scanning Service area definition with 1 km resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 11136
Number of rows: 11136
Area extent: (-5567999.9986, 5567999.9986, 5567999.9986, -5567999.9986)

Area ID: None
Description: None
Projection ID: WGS 84
Projection: {'datum': 'WGS84', 'no_defs': 'None', 'proj': 'longlat', 'type': 'crs'}
Number of columns: 13500
Number of rows: 6750
Area extent: (-180.0, -90.0, 180.0, 90.0)

Area ID: mtg_fci_fdss_500m
Description: MTG FCI Full Disk Scanning Service area definition with 500 m resolution
Projection: {'ellps': 'WGS84', 'h': '35786400', 'lon_0': '0', 'no_defs': 'None', 'proj': 'geos', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 22272
Number of rows: 22272
Area extent: (-5567999.9996, 5567999.9996, 5567999.9996, -5567999.9996)
pnuu commented 5 months ago

I adjusted the title and added the short summary what triggers the behaviour to the description.

pnuu commented 5 months ago

I tested by pre-resampling the background image to FCI projection and euro4, neither of which helped and a second resampling step is still required when generate=False is used.

pnuu commented 5 months ago

Above I forgot to also resample the land-sea mask. If both background image and LSM are resampled to the final image projection, the composite generation works with a single resample() call even with scn.load(..., generate=False). I can use this as a background, but it'd be great to find out what causes this behaviour.

yukaribbba commented 5 months ago

Do you still have this problem with my #2696 ? I don't know if this is a compositor issue or something else...

pnuu commented 5 months ago

@yukaribbba yes, with #2696 I still need to do douple resampling when using generate=False.

I think the problem is somewhere in the decision tree in a case where the IncompatibleAreas is raised by a sub-composite.

djhoese commented 5 months ago

Good discovery @pnuu!

Ok, this is me debugging and brainstorming:

  1. StaticCompositor nodes shouldn't have any dependencies so they are treated like this:

    https://github.com/pytroll/satpy/blob/33f354f9d6d8c9e62ca8f79d4cb60809d20b6235/satpy/dependency_tree.py#L563-L567

    The idea is that even though the static compositor doesn't have any dependencies we need to add a fake empty node as a dependency. Otherwise this compositor will get treated like a reader-loaded dataset when we start loading things by doing dependency_tree.leaves() and dependency_tree.trunk(). That is, composites and modified datasets are trunk nodes, reader datasets are leaves.

  2. Trunk nodes aren't accessed anywhere in the Scene for loading except for here (ignore "available_composite_ids", etc):

    https://github.com/pytroll/satpy/blob/f11db543b2d42e634ddc6a8c935d17094ea1bc4c/satpy/scene.py#L1532-L1537

    But this isn't called when generate=False.

  3. By saying generate=False, you're basically saying "don't look at composites at all" as an optimization essentially.

So the questions/choices are:

pnuu commented 4 months ago
* Other solutions?

I think the easiest route, at least for my own worklfow, is to pre-resample the background images (black marble and land-sea mask) to my target projection and adjust the composite configs in $SATPY_CONFIG_PATH/composites/<instrument>.yaml to point url to the resampled versions. This means I'll need a separate variant for each target area, but for me that's ok since outside of testing I pretty much only ever use one or two areas.

simonrp84 commented 4 months ago

FWIW, I have encountered this problem just now with one of my own composites...BUT, the static image background is on the same projection / area as the rest of the data.

pnuu commented 4 months ago

@simonrp84 so scn[chan].attrs['area'] == scn2['image'].attrs['area'] is True?