What is our preferred metadata format?

AlexGilleran commented 6 years ago

Following up on something Ellen Broad said at the Data61 E&D retreat - we're focused on supporting all the metadata formats we can, sure, but what's our actual preferred format? Say someone wants to publish a metadata catalogue and ensure future compatibility with MAGDA, what do we tell them?

DCAT is obviously pretty natural, but we settled on that for convenience, not because we like it
ISO 19115? (geospatial only)?
AGLS?
Do we make our own? (this seems like a mistake)

kring commented 6 years ago

Well our "preferred" format is the one we use natively: our system of aspects. No other metadata format will capture the full range of information that can be captured by Magda. So that's kind of like making our own, but we try hard to maintain commonality / compatiblity with more widely used ones, which is why our aspects are closely aligned with DCAT.

ellenbroad commented 6 years ago

Hello yes that was me :). I know that data.gov.au used to use AGLS, with components from DCAT and Dublin Core. Reading back through old data.gov.au wiki pages, it looks like at some point Finance was conducting an analysis of metadata standards: https://toolkit.data.gov.au/index.php?title=Discovering_Metadata. Maintaining compatibility with DCAT/AGLS is good. The conversation Alex and I were having was about other departments commissioning their own data catalogues and choosing metadata standards, and whether problems of incompatibility with MAGDA might arise.

AlexGilleran commented 6 years ago

Thanks @ellenbroad :).

@kring yeah naturally our aspects are best, but they're not a standard that we can tell organisations to work towards using.

In the end though we absolutely should be compatible with metadata described using any of the standards in the bullet points, so I'm probably thinking about this in the wrong way... we don't need to pick a favourite so much as ensure that we remain compatible with DCAT, ANZLIC, AGS, etc, which shouldn't be that hard.

dr-shorthair commented 6 years ago

FWIW - I've been chairing the W3C team working on a revision of DCAT. See editor's draft here: https://w3c.github.io/dxwg/dcat/ The change log is https://w3c.github.io/dxwg/dcat/#changes

Key enhancements adopted so far include:

explicit support for cataloguing services, and potentially other things
clarify links between datasets, representations and data-services
adopt elements and patterns from PROV-O to support better data lineage
use of DQV for data quality description
use of ODRL for permissions/prohibitions/obligations

The main issue tracker is https://github.com/w3c/dxwg/issues so you can just join in. Note, however, that as this is a revision of an existing W3C recommendation, there is a presumption that DCAT-2014 implementations will not be made invalid. So the scope for revision is primarily additive.

I would be very happy to discuss Magda requirements, and if appropriate shepherd them through the revision process. Note that IM&T have already had some impact (coming from CSIRO DAP work).

AlexGilleran commented 6 years ago

use of DQV for data quality description

Interesting :O

dr-shorthair commented 6 years ago

Key message here is that DCAT is expected to be mixed up with other RDF vocabularies, as necessary. That is a strength of the RDF ecosystem - adding features can be done unobtrusively. It also presents some challenges - too much flexibility can be overwhelming and hard to manage. So rules around this and good documentation is helpful.

AlexGilleran commented 4 years ago

Closing this because I don't think we're really ever going to have a preferred metadata format. We like metadata in all shapes and sizes.

magda-io / magda

What is our preferred metadata format? #755