opendatacube / eo-datasets

Easily write, validate and convert EO datasets and metadata.
Apache License 2.0
50 stars 26 forks source link

What even IS eo3? Fork proposal #301

Open SpacemanPaul opened 1 year ago

SpacemanPaul commented 1 year ago

an EO3 document is a document that:

a) conforms with the (undocumented) metadata conventions established by eo-datasets; and b) conforms to datacube-core's (undocumented) assumptions about the structure of eo3 dataset docs.

These are not always in agreement (i.e. datacube-core stores lineage internally in a different format to that output by eo-datasets.)

I propose splitting eo-datasets into two repositories:

  1. A new opendatacube/eo3 repository which defines, documents, validates, serialises and deserialises the attributes and properties of an EO3 document that are assumed internally by core and therefore need formal and strict definition;
  2. Leaving eo-datasets to define, document, and validate the metadata catalog for various collections and packaging conventions, and handle normalising and writing out according to various packaging conventions. These collections and packaging conventions can vary and diverge as required.

This split will facilitate:

a. Allow better sharing of code between (what is now) eo-datasets and core, e.g. as requested in #294. b. Facilitate future extensions and updates to what core uses. e.g. CSIRO are looking into contributing ODC support for loading into multidimensional xarrays (e.g. for hyperspectral or climate modelling use cases)

woodcockr commented 1 year ago

Whilst we are doing this suggest we look at some aspects of consistency with STAC

SpacemanPaul commented 1 year ago

Whilst we are doing this suggest we look at some aspects of consistency with STAC

Leaving eo-datasets to define, document, and validate the metadata catalog for various collections and packaging conventions, and handle normalising and writing out according to various packaging conventions.

Kirill888 commented 1 year ago

https://odc-stac.readthedocs.io/en/latest/stac-vs-odc.html

About stac

SpacemanPaul commented 1 year ago

There's also some metadata differences which I believe Rob encountered recently - e.g. STAC allows list of instruments, ODC flattens this list into a single comma-separated instrument value.

woodcockr commented 1 year ago

Also ODC needs the product id and metadata id to do its references internally. Some other mostly minor but prohibitive tweaks. @Kirill888 I was looking at doing a PR into odc-stac eo3 but became uncertain after I found more minor differences, wasn't sure what "correct" was. I think this piece of work @SpacemanPaul is proposing with this issue will sort my end and I can work on a PR for odc-stac eo3 for ODC conversion to tidy this up. FYI, I used odc-stac in this context because it handled stac extensions for projection nicely which resolved my metadata issue and because I think it's a good path forward in this space.

Kirill888 commented 1 year ago

@woodcockr, my understanding is that eo-datasets is all about data generation, both rasters and the accompanying metadata in "eo3 convention". There is actually very little overlap with odc-stac, I just linked that piece of documentation in response to your comment about stac vs odc comment.

As far as "what eo3 is" question? Would be good to have that properly defined, as I'm sure it has changed a lot over time. From "historical" context, "eo3" was all about capturing the following information about the underlying rasters

  1. Precise pixel shape and geo-referencing for all bands of a dataset
  2. Raster properties: dtype, nodata
  3. De-duplication of duplicated geo-referencing information that is present in eo

Information that was missing in "eo" and that was required for more "automatic" data loading behaviours in dc.load.

The equivalent STAC extensions are Projection (proposed by GA based on eo3) and Raster.

SpacemanPaul commented 1 year ago

Work is underway: https://github.com/opendatacube/eo3