Open SpacemanPaul opened 1 year ago
Whilst we are doing this suggest we look at some aspects of consistency with STAC
Whilst we are doing this suggest we look at some aspects of consistency with STAC
Leaving
eo-datasets
to define, document, and validate the metadata catalog for various collections and packaging conventions, and handle normalising and writing out according to various packaging conventions.
There's also some metadata differences which I believe Rob encountered recently - e.g. STAC allows list of instruments, ODC flattens this list into a single comma-separated instrument value.
Also ODC needs the product id and metadata id to do its references internally. Some other mostly minor but prohibitive tweaks. @Kirill888 I was looking at doing a PR into odc-stac eo3 but became uncertain after I found more minor differences, wasn't sure what "correct" was. I think this piece of work @SpacemanPaul is proposing with this issue will sort my end and I can work on a PR for odc-stac eo3 for ODC conversion to tidy this up. FYI, I used odc-stac in this context because it handled stac extensions for projection nicely which resolved my metadata issue and because I think it's a good path forward in this space.
@woodcockr, my understanding is that eo-datasets
is all about data generation, both rasters and the accompanying metadata in "eo3 convention". There is actually very little overlap with odc-stac
, I just linked that piece of documentation in response to your comment about stac vs odc comment.
As far as "what eo3 is" question? Would be good to have that properly defined, as I'm sure it has changed a lot over time. From "historical" context, "eo3" was all about capturing the following information about the underlying rasters
dtype
, nodata
eo
Information that was missing in "eo" and that was required for more "automatic" data loading behaviours in dc.load
.
The equivalent STAC extensions are Projection (proposed by GA based on eo3) and Raster.
Work is underway: https://github.com/opendatacube/eo3
an EO3 document is a document that:
a) conforms with the (undocumented) metadata conventions established by
eo-datasets
; and b) conforms todatacube-core
's (undocumented) assumptions about the structure of eo3 dataset docs.These are not always in agreement (i.e.
datacube-core
stores lineage internally in a different format to that output by eo-datasets.)I propose splitting
eo-datasets
into two repositories:opendatacube/eo3
repository which defines, documents, validates, serialises and deserialises the attributes and properties of an EO3 document that are assumed internally by core and therefore need formal and strict definition;eo-datasets
to define, document, and validate the metadata catalog for various collections and packaging conventions, and handle normalising and writing out according to various packaging conventions. These collections and packaging conventions can vary and diverge as required.This split will facilitate:
a. Allow better sharing of code between (what is now) eo-datasets and core, e.g. as requested in #294. b. Facilitate future extensions and updates to what core uses. e.g. CSIRO are looking into contributing ODC support for loading into multidimensional xarrays (e.g. for hyperspectral or climate modelling use cases)