stcorp / harp

Data harmonization toolset for scientific earth observation data
http://stcorp.github.io/harp/doc/html/index.html
BSD 3-Clause "New" or "Revised" License
57 stars 19 forks source link

Develop HARP ingestion standard #306

Open StevenCompernolle opened 5 months ago

StevenCompernolle commented 5 months ago

HARP conventions are being used more and more not only for in-memory representation, but also to archive data. Data storage encoding/decoding optimization techniques are however not meant to be included in the HARP conventions

In this more specific issue, #305 (about encoding pressure profile with hybrid sigma-pressure coefficients), Sander formulated as follows

I do realise that the HARP conventions are now getting considered more and more as an actual storage/archive format. And for archival/distribution purposes, I understand the need for this kind of storage optimisation. However, this would not be something for the actual interface conventions themselves. If we would introduce some kind of format that would combine HARP conventions and specific compression/encoding techniques, then this would have to become its own kind of sub-standard, with a special importer in the HARP software (similar to all the other foreign-format importers) that would decode and remove all these compressed elements automatically as soon as they are read. Defining such a standard would take some consideration though. We should then look into further cases that should be supported here. Introducing such a standard purely for the pressure profile compression would be a bit overkill.

The current issue is meant to cover this more general question of a HARP ingestion standard.

StevenCompernolle commented 4 months ago

Another example of storage optimization technique is the reduced Gaussian grid used by ECMWF. I could harp import CAMS REA files only after I converted to a regular grid using CDO. It would be of interest to avoid to have CDO as separate tool, and therefore have the reduced-to-regular grid conversion within the HARP ingestion.

svniemeijer commented 4 months ago

Reduced Gaussian grid representation is actually something that is already possible in HARP. You would have to use a latitude-dependent longitude grid longitude {latitude, longitude} (similar to how you can have a time-dependent altitude grid altitude {time, vertical}). This does not require any changes to the standard.

What is not supported well though is the ability for HARP to provide operations on data with a longitude {latitude, longitude} axis variable. But that would be something for a different ticket.

svniemeijer commented 4 months ago

Also be aware that the built-in ECMWF GRIB ingestion in HARP poses several limitations. It uses CODA for the grib reading which doesn't support the new CCSDS compression method, for instance. And it currently only supports regular grids. Modifying this built-in ingestion would also be a different ticket from this.