ioos / erddap-gold-standard

Contains the 'gold standard' ERDDAP configuration, with datasets compliant with IOOS Metadata Profile 1.2
https://standards.sensors.ioos.us/erddap/index.html
8 stars 17 forks source link

Include nco-yaml representations of datasets (and script) #65

Closed srstsavage closed 5 months ago

srstsavage commented 9 months ago

As an alternate approach to using nccsv source of truth datasets in #51, instead we store text yaml representations of the gold standard datasets based on nco-json output (nco-yaml if you will, which is more readable than json due to lack of brackets).

Also included is a script to easily (re-)generate the yaml representations from the source of truth NetCDF files, and to check if the yaml representations are up to date (possibly used in a future pre-commit hook, GitHub action, etc).

ocefpaf commented 9 months ago

@srstsavage I love the idea of having the text representation for these datasets but I want to ask you:

  1. Why not cdl → nc → cdl? That would save folks from having to install nco and deal with yet another markup language.
  2. Are you planning on creating the binary on the fly or keeping both side-by-side in the repo? While I'd love to create them on the fly to make the repo lightweight I believe we have to keep the nc around b/c we will be linking to them in other docs pages.
mwengren commented 5 months ago

@srstsavage I think we're leaning towards the GHA-based .nc to .cdl and vice-versa approach in @ocefpaf's PR #70.

Do you see any issues with that going that route rather than the .nco-yaml and nccsv approaches you take here and in #51?

I think it's a great idea to include a text-based version of the gold standard files in the repo, so thanks for the suggestion!

After discussing the options recently with the IOOS DMAC folks, we felt that CDL had more utility and readability, and was also being used in other community data format repos such as the OceanGliders/OG-format-user-manual repo as a diff-able representation of a data format.

If you have thoughts on the approach in #70 please add them there.

srstsavage commented 5 months ago

@ocefpaf @mwengren Yes! Sorry, I thought I had already responded here :sweat_smile:. Agreed that CDL makes the most sense for a text representation. I was originally leaning toward yaml because it's much easier to query, transform to other formats, etc with a wide range of tools, but you're right that CDL is the more natural and supported text format in the netCDF ecosystem. Even with CDL it's easy enough to get to standard formats like json or yaml with a few extra hops through netcdf/nco commands if those are available.

Closing in favor of #70, thanks!