opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
514 stars 177 forks source link

Ingestor migration from 1.5.5 to 1.6rc1 causes product incompatibility warnings/errors #423

Closed Kirill888 closed 6 years ago

Kirill888 commented 6 years ago

Expected behaviour

Should be able to update previously ingested product with the new datacube version.

Actual behaviour

Reported by @loicdtx on slack, ingestor aborts with this error

ValueError: Ingest config differs from the existing output product, but allow_product_changes=False

Steps to reproduce the behaviour

Config used for ingestion is available here

https://github.com/CONABIO/antares3/blob/3edfecb46ea425ce67f518d3ab13cf08ca36718d/madmex/conf/ingestion/s2_l2a_20m_mexico.yaml

EDIT: link above

Environment information

Kirill888 commented 6 years ago

Rolling back to 1.5.5 allows ingestion to proceed, so it's definitely not a config file change problem.

loicdtx commented 6 years ago

Here's the output of datacube product show s2_l2a_20m_granule

{
    "metadata_type": "eo",
    "name": "s2_l2a_20m_granule",
    "description": "Sentinel 2 data processed with sen2cor",
    "metadata": {
        "instrument": {
            "name": "MSI"
        },
        "platform": {
            "code": "sentinel2"
        },
        "product_type": "sen2cor",
        "format": {
            "name": "JPEG2000"
        }
    },
    "measurements": [
        {
            "nodata": 0,
            "name": "blue",
            "dtype": "uint16",
            "aliases": [
                "band_2",
                "blue"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "green",
            "dtype": "uint16",
            "aliases": [
                "band_3",
                "green"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "red",
            "dtype": "uint16",
            "aliases": [
                "band_4",
                "red"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "re1",
            "dtype": "uint16",
            "aliases": [
                "band_5",
                "red_edge_1",
                "re1"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "re2",
            "dtype": "uint16",
            "aliases": [
                "band_6",
                "red_edge_2",
                "re2"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "re3",
            "dtype": "uint16",
            "aliases": [
                "band_7",
                "red_edge_3",
                "re3"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "nir",
            "dtype": "uint16",
            "aliases": [
                "band_8A",
                "nir"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "swir1",
            "dtype": "uint16",
            "aliases": [
                "band_11",
                "swir1"
            ],
            "units": "reflectance"
        },
        {
            "nodata": 0,
            "name": "swir2",
            "dtype": "uint16",
            "aliases": [
                "band_12",
                "swir2"
            ],
            "units": "reflectance"
        },
        {
            "name": "pixel_qa",
            "dtype": "uint16",
            "flags_definition": {
                "sca": {
                    "bits": [
                        0,
                        1,
                        2,
                        3,
                        4,
                        5,
                        6,
                        7,
                        8,
                        9,
                        10,
                        11,
                        12,
                        13,
                        14,
                        15
                    ],
                    "description": "Sen2Cor Scene Classification",
                    "values": {
                        "4": "Vegetation",
                        "5": "Not-vegetated",
                        "9": "Cloud high probability",
                        "0": "No Data",
                        "3": "Cloud shadows",
                        "11": "Snow or ice",
                        "10": "Thin cirrus",
                        "8": "Cloud medium probability",
                        "6": "Water",
                        "7": "Unclassified",
                        "2": "Dark features / Shadows",
                        "1": "Saturated or defective pixel"
                    }
                }
            },
            "nodata": 0,
            "aliases": [
                "slc",
                "qa"
            ],
            "units": "1"
        }
    ]
}
datacube --version
Open Data Cube core, version 1.5.5+5.gdd0aaa3
loicdtx commented 6 years ago

And here's the output of datacube product show s2_l2a_20m_mexico

{
    "storage": {
        "crs": "PROJCS[\"unnamed\",GEOGCS[\"WGS 84\",DATUM[\"unknown\",SPHEROID[\"WGS84\",6378137,6556752.3141]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433]],PROJECTION[\"Lambert_Conformal_Conic_2SP\"],PARAMETER[\"standard_parallel_1\",17.5],PARAMETER[\"standard_parallel_2\",29.5],PARAMETER[\"latitude_of_origin\",12],PARAMETER[\"central_meridian\",-102],PARAMETER[\"false_easting\",2500000],PARAMETER[\"false_northing\",0]]",
        "origin": {
            "y": 2426720,
            "x": 977160
        },
        "resolution": {
            "y": -20,
            "x": 20
        },
        "tile_size": {
            "y": 100020,
            "x": 100020
        }
    },
    "name": "s2_l2a_20m_mexico",
    "managed": true,
    "metadata": {
        "product_type": "sen2cor",
        "instrument": {
            "name": "MSI"
        },
        "platform": {
            "code": "sentinel2"
        },
        "format": {
            "name": "NetCDF"
        }
    },
    "measurements": [
        {
            "name": "blue",
            "aliases": [
                "band_2",
                "blue"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "green",
            "aliases": [
                "band_3",
                "green"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "red",
            "aliases": [
                "band_4",
                "red"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "re1",
            "aliases": [
                "band_5",
                "red_edge_1",
                "re1"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "re2",
            "aliases": [
                "band_6",
                "red_edge_2",
                "re2"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "re3",
            "aliases": [
                "band_7",
                "red_edge_3",
                "re3"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "nir",
            "aliases": [
                "band_8A",
                "nir"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "swir1",
            "aliases": [
                "band_11",
                "swir1"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "swir2",
            "aliases": [
                "band_12",
                "swir2"
            ],
            "nodata": 0,
            "dtype": "uint16",
            "units": "reflectance"
        },
        {
            "name": "pixel_qa",
            "flags_definition": {
                "sca": {
                    "bits": [
                        0,
                        1,
                        2,
                        3,
                        4,
                        5,
                        6,
                        7,
                        8,
                        9,
                        10,
                        11,
                        12,
                        13,
                        14,
                        15
                    ],
                    "values": {
                        "9": "Cloud high probability",
                        "5": "Not-vegetated",
                        "6": "Water",
                        "2": "Dark features / Shadows",
                        "3": "Cloud shadows",
                        "1": "Saturated or defective pixel",
                        "10": "Thin cirrus",
                        "4": "Vegetation",
                        "8": "Cloud medium probability",
                        "0": "No Data",
                        "7": "Unclassified",
                        "11": "Snow or ice"
                    },
                    "description": "Sen2Cor Scene Classification"
                }
            },
            "dtype": "uint16",
            "units": "1",
            "aliases": [
                "slc",
                "qa"
            ],
            "nodata": 0
        }
    ],
    "metadata_type": "eo",
    "description": "Sentinel 2 bottom of atmosphere processed with sen2cor. Resampled to 20m Mexico INEGI Lambert Conformal Conic projection with a 100 km tile size."
}
Kirill888 commented 6 years ago

My guess so far is that format.name is to blame in this case

https://github.com/opendatacube/datacube-core/blob/b987f362ed693408fd746fe8dcf5b0995a44c8fc/datacube/scripts/ingest.py#L54

In the past we copied format name as is, now it comes from the "normalised" driver name. So my guess is that the difference is "NetCDF" != "NetCDF CF".

Ingest should report more about the differences, it has access to that info.

uchchwhash commented 6 years ago

The normalised driver name issue has come up before, see #411 for example. Perhaps include this in the 'What's New' section of the release notes?

edit: apologies, #411 seems to be a separate issue altogether.

omad commented 6 years ago

The change that has broken things here is that storage.driver is now being stored in the Product in the DB. It was made in commit 63d721be6972c44c2cbfa9eb3804fe1c6e6b491f when DriverManager was still a thing.

I think it was an oversite to not revert this, and the best option is to not store the storage.driver name in the database product. As far as I can tell that would be more in line with the fields which are kept when morphing from an ingestion configuration to a product definition.

@Kirill888 Would this have any impact on how drivers are selected?

@jeremyh @andrewdhicks Does this sound okay to you two?

omad commented 6 years ago

@loicdtx If you run ingester in verbose mode, it will log which changes, if any, need to be made to the target product. I've run a test with your configuration files between 1.5.5 and 1.6rc1 and get the following output:

$ datacube -v ingest -c s2_l2a_20m_mexico.yaml
2018-05-01 10:30:28,396 68994 datacube INFO Running datacube command: /Users/omad/miniconda3/envs/py36/bin/datacube -v ingest -c s2_l2a_20m_mexico.yaml
2018-05-01 10:30:28,563 68994 datacube-ingest INFO Created DatasetType s2_l2a_20m_mexico
2018-05-01 10:30:28,575 68994 datacube.index._products INFO Unsafe change in storage.driver from missing to 'NetCDF CF'
2018-05-01 10:30:28,575 68994 datacube-ingest INFO Cannot update "s2_l2a_20m_mexico": 1 unsafe changes, 0 safe changes
2018-05-01 10:30:28,575 68994 datacube-ingest INFO Safe changes: []
2018-05-01 10:30:28,575 68994 datacube-ingest INFO Unsafe changes: [(('storage', 'driver'), missing, 'NetCDF CF')]

This shows how we're now attempting to store storage.driver whereas before we weren't. I think this is a bug and we'll fix it in the next release.

In the meantime, after reviewing the log output, it's possible to run:

datacube -v ingest --allow-product-changes -c s2_l2a_20m_mexico.yaml

Which will update the product in the database. After we fix the bug, you would need to run again with the same option to convert back to the old definition.

Kirill888 commented 6 years ago

@omad only format and protocol are used to select driver, nothing under storage is consulted. Not sure what are the expectations in the s3aio driver are.