magda-io / magda

A federated, open-source data catalog for all your big data and small data
https://magda.io
Apache License 2.0
507 stars 92 forks source link

What is our preferred metadata format? #755

Closed AlexGilleran closed 4 years ago

AlexGilleran commented 6 years ago

Following up on something Ellen Broad said at the Data61 E&D retreat - we're focused on supporting all the metadata formats we can, sure, but what's our actual preferred format? Say someone wants to publish a metadata catalogue and ensure future compatibility with MAGDA, what do we tell them?

kring commented 6 years ago

Well our "preferred" format is the one we use natively: our system of aspects. No other metadata format will capture the full range of information that can be captured by Magda. So that's kind of like making our own, but we try hard to maintain commonality / compatiblity with more widely used ones, which is why our aspects are closely aligned with DCAT.

ellenbroad commented 6 years ago

Hello yes that was me :). I know that data.gov.au used to use AGLS, with components from DCAT and Dublin Core. Reading back through old data.gov.au wiki pages, it looks like at some point Finance was conducting an analysis of metadata standards: https://toolkit.data.gov.au/index.php?title=Discovering_Metadata. Maintaining compatibility with DCAT/AGLS is good. The conversation Alex and I were having was about other departments commissioning their own data catalogues and choosing metadata standards, and whether problems of incompatibility with MAGDA might arise.

AlexGilleran commented 6 years ago

Thanks @ellenbroad :).

@kring yeah naturally our aspects are best, but they're not a standard that we can tell organisations to work towards using.

In the end though we absolutely should be compatible with metadata described using any of the standards in the bullet points, so I'm probably thinking about this in the wrong way... we don't need to pick a favourite so much as ensure that we remain compatible with DCAT, ANZLIC, AGS, etc, which shouldn't be that hard.

dr-shorthair commented 6 years ago

FWIW - I've been chairing the W3C team working on a revision of DCAT. See editor's draft here: https://w3c.github.io/dxwg/dcat/ The change log is https://w3c.github.io/dxwg/dcat/#changes

Key enhancements adopted so far include:

The main issue tracker is https://github.com/w3c/dxwg/issues so you can just join in. Note, however, that as this is a revision of an existing W3C recommendation, there is a presumption that DCAT-2014 implementations will not be made invalid. So the scope for revision is primarily additive.

I would be very happy to discuss Magda requirements, and if appropriate shepherd them through the revision process. Note that IM&T have already had some impact (coming from CSIRO DAP work).

AlexGilleran commented 6 years ago

use of DQV for data quality description

Interesting :O

dr-shorthair commented 6 years ago

Key message here is that DCAT is expected to be mixed up with other RDF vocabularies, as necessary. That is a strength of the RDF ecosystem - adding features can be done unobtrusively. It also presents some challenges - too much flexibility can be overwhelming and hard to manage. So rules around this and good documentation is helpful.

AlexGilleran commented 4 years ago

Closing this because I don't think we're really ever going to have a preferred metadata format. We like metadata in all shapes and sizes.