opendatacube / datacube-ows

Open Data Cube Open Web Services
Other
71 stars 37 forks source link

Layers appear out of sync in WMS and WCS if band math is used in styling #1041

Open christophfriedrich opened 4 months ago

christophfriedrich commented 4 months ago

Description

The standard way to serve calculated products (e.g. NDVI) via WMS seems to be to load the raw bands and do the band math accordingly, e.g. like in this example from the documentation:

ndvi_bidirection_cfg = {
    "index_function": {
        "function": "datacube_ows.band_utils.norm_diff",
        "mapped_bands": True,
        "kwargs": {"band1": "nir", "band2": "red"},
    },
    "mpl_ramp": "RdYlGn",
    "range": [-1.0, 1.0]
}

Seeing this all over the docs, I set up a system that uses somewhat complex index functions to calculate various products (see e.g. https://github.com/eo2cube/ows_config/blob/4fe21710cc278e9c2ee4b46a742aaaa9f343899d/s2_vi/formulas.py#L16).

Now we need the raw values of those calculations elsewhere and thought "Perfect, we've got the WCS, we can just export them from there!" But as the WCS serves the "raw data" and the calculations are technically happening in the styling part, the only thing I got were the very raw imagery values, not the calculated product values. (To stay with the example: I got the Red and NIR values, not the NDVI calculated from them.)

I browsed through the existing issues and this has been touched upon before at https://github.com/opendatacube/datacube-ows/issues/510#issuecomment-772058579 with Paul stating:

We feel that the purpose of WCS is to provide access to the raw data and that anybody interested in WCS would be more than capable of doing trivial bandmath on the data themselves after downloading.

I agree that I'm perfectly capable of doing the same bandmath again elsewhere, but (1) it's not me who wants to use those values and (2) I've got everything so nicely configured in OWS already that it would be super sweet to just download the stuff from there straight away. I also agree that WCS should provide the raw data, but the question is how raw should it be?

I would also like to raise the point of consistency of the same product across the various services. Which leads me to formulate the expected behaviour: WMS and WCS export "the same thing". As in: Taking the values from the WCS and applying the same colour ramp as in the styling config results in the same image that the WMS exports. Or worded vice versa: 'Un-applying' the colour ramp from the WMS image yields the WCS values.

Or is there another way to get values out of the WCS that have the index function already applied?

Steps to Reproduce

Compare the result of this WMS request:

https://ows.eo2cube.org/wms?SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&LAYERS=s2_vi_ndvi_diff_winter_wheat&STYLES=&WIDTH=512&HEIGHT=512&FORMAT=image%2Fpng&CRS=EPSG%3A3857&DPI=96&MAP_RESOLUTION=96&FORMAT_OPTIONS=dpi%3A96&TRANSPARENT=TRUE&TIME=2020-01-02T00%3A00%3A00Z&BBOX=1389247.83251150674186647,7142683.10669049434363842,1392738.67198907374404371,7146173.94616806134581566

with what you get from this corresponding WCS request:

https://ows.eo2cube.org/wcs?SERVICE=WCS&VERSION=1.0.0&REQUEST=GetCoverage&FORMAT=GeoTIFF&COVERAGE=s2_vi_ndvi_diff_winter_wheat&WIDTH=512&HEIGHT=512&CRS=EPSG:3857&RESPONSE_CRS=EPSG:3857&TIME=2020-01-02T00%3A00%3A00Z&BBOX=1389247.83251150674186647,7142683.10669049434363842,1392738.67198907374404371,7146173.94616806134581566

(same TIME and BBOX and LAYERS/COVERAGE)

Context (Environment)

datacube-ows version

datacube-ows-update --version Open Data Cube Open Web Services (datacube-ows) version 1.8.36.dev0+g67003f3.d20240617

datacube-ows config

https://github.com/eo2cube/ows_config/

datacube product metadata

datacube product show s2_c1_l2a

Result ```yaml description: The Sentinel-2 program provides global imagery in thirteen spectral bands at 10m-60m resolution and a revisit time of approximately five days. This dataset represents the global Sentinel-2 archive, from 2016 to the present, processed to L2A (bottom-of-atmosphere) using Sen2Cor. measurements: - aliases: - B01 dtype: uint16 name: coastal nodata: 0 units: '1' - aliases: - B02 dtype: uint16 name: blue nodata: 0 units: '1' - aliases: - B03 dtype: uint16 name: green nodata: 0 units: '1' - aliases: - B04 dtype: uint16 name: red nodata: 0 units: '1' - aliases: - B05 dtype: uint16 name: rededge1 nodata: 0 units: '1' - aliases: - B06 dtype: uint16 name: rededge2 nodata: 0 units: '1' - aliases: - B07 dtype: uint16 name: rededge3 nodata: 0 units: '1' - aliases: - B08 dtype: uint16 name: nir nodata: 0 units: '1' - aliases: - B8A dtype: uint16 name: nir08 nodata: 0 units: '1' - aliases: - B09 dtype: uint16 name: nir09 nodata: 0 units: '1' - aliases: - B11 dtype: uint16 name: swir16 nodata: 0 units: '1' - aliases: - B12 dtype: uint16 name: swir22 nodata: 0 units: '1' - aliases: - SCL dtype: uint8 flags_definition: sca: bits: - 0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 description: Sen2Cor Scene Classification values: '1': saturated or defective '10': thin cirrus '11': snow or ice '2': dark area pixels '3': cloud shadows '4': vegetation '5': bare soils '6': water '7': unclassified '8': cloud medium probability '9': cloud high probability name: scl nodata: 0 units: '1' - aliases: - aerosol_optical_thickness - AOT dtype: uint16 name: aot nodata: 0 units: '1' - aliases: - scene_average_water_vapour - WVP dtype: uint16 name: wvp nodata: 0 units: '1' metadata: product: name: s2_c1_l2a metadata_type: eo3 name: s2_c1_l2a ```
SpacemanPaul commented 4 months ago

Thanks Christoph. As it happens we've had some very similar conversations around WMS/WCS interoperation internally at DEA recently.

The problem is that WMS and WCS are very different protocols with very different data models and intended use cases. OWS does blur those distinctions somewhat in an effort to promote consistency between them, but there's only so far we can go while still being standards compliant for both.

The main issues complicating consistency are:

1) WMS has the concept of "styles" within a layer and leaves implementations fairly unconstrained in how they interpret those concepts. The original intent appears to be supplying different ways of rendering underlying vector data into map tiles, and the standard allows for selecting multiple styles simultaneously, which doesn't make much sense for OWS's pure raster approach. 2) WCS has no concept of "styles", but rather thinks in terms of ranges (i.e. bands/measurements). 3) The WCS spec is much more proscriptive across the board, and doesn't really have any conceptual areas where the implementation is unconstrained in re-interpreting them for particular use cases, as is the case in WMS. 4) The fact that WCSv1 and WCSv2 are almost completely different protocols further complicates matters. 5) When working with multiple dates, WMS always returns a 2D map tile where WCS can return a 3D (x,y,t) raster file.

With that mind however, it should be possible to collect the index styles offered by a given layer, and advertise those indexes as additional "raw bands" on the corresponding coverage in WCS. That way your WCS query above (which doesn't specify measurements and therefore gets all available measurements) would get the raw bands AND the calculated style indexes.

But multi-date handler indexes in WCS (as previously requested by @robbibt ) are probably unachievable without breaking WCS's request model.

christophfriedrich commented 3 months ago

I haven't really used WCS before, so your considerations are probably a lot more profound, but they are also going a lot further and are a lot more fundamental than what I actually meant. I don't need multi-date stuff or more than one style per layer, I just want a 2D raster of the values that I see colour-coded in the WMS.

And that's my main point: Currently, the output of the two services is fundamentally different when an index function is involved. When I see an NDVI map in the WMS, and specify COVERAGE=s2_vi_ndvi in the WCS, I expect a raster with NDVI values between -1 and 1, not two rasters with raw Sentinel-2 values in the four-digit range plus the categorical layer used to mask the whole thing. Because semantically that's not what I requested.

The problems you mentioned are valid if one wanted to fully map every feature of each standard to the other. But I don't see how they would block fixing the problem I described?


it should be possible to collect the index styles offered by a given layer [... to get] the calculated style indexes.

Because I needed this now, I plunged forward and constructed a hacky tweak that works for our use case. For this I replaced the standard WCS TIFF handler in

    "wcs": {
        "formats": {
            "GeoTIFF": {
                "renderers": {
                    "1": "datacube_ows.wcs1_utils.get_tiff"

with a self-written function

                    "1": "ows_refactored.s2_vi.wcs.as_tiff_wcs1",

that does essentially this:

from datacube_ows.wcs1_utils import get_tiff

class FakeBandIndex:
    def band_label(self, band):
        return "index_function"
    def nodata_val(self, band):
        return 0

def as_tiff_wcs1(req, data):
    dataarray = req.product.styles[0].apply_index(data)
    dataset = dataarray.to_dataset()
    req.product.band_idx = FakeBandIndex()  # very hacky...
    return get_tiff(req, dataset)

See https://github.com/eo2cube/ows_config/commit/47d122fac54552789c5e92c7c7d73b6c26b12ea2 (the code there has comments)

So I just apply the index function, which of course fundamentally changes the data structure, createing several issues further down, which I just silence by injecting that FakeBandIndex that simply returns the hard-coded values I need in this case.

It works beautifully, you can try it by requesting the WCS link from my initial post.

But of course it's absolutely not generic...


The only current problem for me is that masking is not applied. I thought I could sneak that functionality too, but it turns out that the mask (1) is not built encapsulatedly, but in the middle of get_map, and (2) is not applied at data level, but directly in the visualisation: _write_png calls transform_data, which first does apply_index but then also immediately color_ramp.apply, before apply_mask_to_image is called, which already expects an RGB(A) array (into which it burns the mask by setting the alpha channel to transparent where needed).

Do you have an idea how this could be solved currently (without replicating a lot of code)?

And what do you think about this approach? Would a more generic version be of interest to be included into datacube-ows?

christophfriedrich commented 3 months ago

Okay, so it turns out that I don't even need all the logic inside get_map to construct an extent_mask, and that the only thing apply_mask_to_image actually does is assuming that there is a red band and adding an alpha one, so I did this:

fake_rgb = dataset.rename({"index_function": "red"})
masked = req.product.styles[0].apply_mask_to_image(fake_rgb, mask, 1, 1)
result = masked["red"].where(masked["alpha"]).to_dataset().rename({"red": "index_function"})

Until I realised that I don't need the logic in apply_mask_to_image either and can literally just do:

mask = req.product.styles[0].to_mask(data)
result = dataset.where(mask)

See https://github.com/eo2cube/ows_config/commit/bc4b795a93f6478a40ba37cc782bd1d2137ffdca


But it appears that the original get_tiff in wcs1_utils.py doesn't expect nans to be present, because as soon as I had some in there, the TIFF statistics were all nan, causing the automatic scaling to fail when one drops the resulting TIFF into QGIS. So in the with memfile.open(driver="GTiff", ... section, I changed these lines:

https://github.com/opendatacube/datacube-ows/blob/a1ab57c11ccafae4bb27a054b75e99f8a441b774/datacube_ows/wcs1_utils.py#L452-L456

to the nan-aware variants:

if cfg.wcs_tiff_statistics:
    dst.update_tags(idx, STATISTICS_MINIMUM=numpy.nanmin(data[band].values))
    dst.update_tags(idx, STATISTICS_MAXIMUM=numpy.nanmax(data[band].values))
    dst.update_tags(idx, STATISTICS_MEAN=numpy.nanmean(data[band].values))
    dst.update_tags(idx, STATISTICS_STDDEV=numpy.nanstd(data[band].values))

That's the only modification in datacube-ows code I needed so far.

SpacemanPaul commented 3 months ago

Nice - yes that's one (slightly hacky) way of doing it! I would like to handle this more generically at some point by treating the index functions of all defined colour ramp styles as pseudo-measurements within WCS.

The stats fix is a bug that could affect all floating-point bands though - if you want to do a quick PR just for that patch, I'll happily merge it.

christophfriedrich commented 3 months ago

I would still argue that the index function(s) are the only 'proper' measurement in this case, but I get your point and if we just add more layers (instead of replacing), everyone gets what they want :)

I would highly appreciate if you were to implement this more generically! With my understanding of the architecture I don't feel like I can do it at this point, but experimenting with a few more requests against my current solution revealed several cases that don't seem to work just yet, so it really is a hacky workaround that is somewhat okay for now but shouldn't stay around for too long, desiring to be replaced by something proper.

PR's there :) I applied the same fix in WCS2 too.

robbibt commented 3 months ago

Hi all, have been listening in to this in the background - just want to confirm that this below very much aligns with what I would think of as "expected functionality" too. Getting back the raw spectral band layers when I select "export" from a specific index function is pretty suprising, and I suspect pretty confusing to our users who just want to export the data that's actually being shown/styled on the map.

And that's my main point: Currently, the output of the two services is fundamentally different when an index function is involved. When I see an NDVI map in the WMS, and specify COVERAGE=s2_vi_ndvi in the WCS, I expect a raster with NDVI values between -1 and 1, not two rasters with raw Sentinel-2 values in the four-digit range plus the categorical layer used to mask the whole thing. Because semantically that's not what I requested.

(totally recognise the complexities to actually achieving this, but just emphasising that the current functionality isn't ideal from a usability perspective).