project-open-data / project-open-data.github.io

Open Data Policy — Managing Information as an Asset
https://project-open-data.cio.gov/
Other
1.34k stars 583 forks source link

Use conformsTo to identify data standards a dataset conforms to #362

Closed philipashlock closed 9 years ago

philipashlock commented 9 years ago

To help surface instances where data standards have already been implemented or where there is a need to identify if standards haven't yet been implemented where they should be, the conformsTo field can be used to include a URI as a unique identifier referencing the relevant data standard. This can be used at both the dataset level and distribution level. It has already been proposed for use at the Catalog level to identify the version of the Project Open Data standard being used - see https://github.com/project-open-data/project-open-data.github.io/issues/309#issuecomment-55138529

conformsTo is a Dublin Core term

If a publisher was listing their data.json file within a dataset's distribution it might look something like this:

"distribution": [
    {
        "description": "Data.json file for Project Open Data", 
        "conformsTo":"http://project-open-data.github.io/schema",
        "downloadURL": "https://agency.gov/data.json", 
        "format": "JSON", 
        "mediaType": "application/json", 
        "title": "data.json"
    }
]
gbinal commented 9 years ago

This will help address #291 and #332.

BernHyland commented 9 years ago

I'd like to propose some simple, concrete suggestions to improve the usefulness of the proposed Project Open Data Metadata Schema v1.1 prior to its update. I understand from the recent "Metadata Schema v1.1 updates" webinar (10/15/2014) that this proposed metadata standard is informed from the Data Catalog Vocabulary (DCAT). [1] As you probably know, DCAT was standardized by the W3C [2] earlier this year.

Specifically, DCAT is intended to be a "an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web”. However, much of the utility of DCAT has been lost in translation due to specific technology choices made to date. If I understand the POD Metadata Schema v1.1 documentation, it loses both the RDF (Linked Data) aspect of DCAT as well as its grounding in the Web. Without them, the POD Metadata Schema cannot facilitate interoperability. It would be both simple and useful to repair both aspects.

The current POD Metadata Schema v1.1 is serialized using JSON.[3] No DCAT namespaces are shown in the examples. The removal of namespaces from the schema is the core issue here because namespaces allow schema terms to be discovered and described on the Web. Without namespaces, users of POD Metadata will have no way to discover the meanings and definitions of terms without a priori knowledge of your intent. Briefly, you have inadvertently removed the “linked” aspect from your data. I suggest making the following minor extensions to the Common Core Metadata Schema:

1) Serialize the Common Core Metadata Schema using JSON-LD [4], a fully JSON-compliant syntax that will ground your terms to the Web.

2) Provide a machine-readable version of the Common Core Metadata Schema on the Web in JSON-LD format so that Web users can acquire it readily.

3) Describe the process for using the machine-readable versions of the Common Core Metadata Schema in your FAQ [5].

I'd be happy to assist in this work if asked. The standardization of the DCAT by the W3C was a reviewed by international participation from the world's leading researchers, practioners and platform vendors. I feel strongly that we need to update the proposed Project Open Data Metadata Schema v1.1 to leverage the combined wisdom & experience of the national & international community of experts.

NB: On the 10/15 call I heard someone ask "was it a recommendation or a standard?" Please note, the W3C as the international standards body for ensuring the 'Web continues to work', has a transparent, rigorous public peer review process to produce published "Recommendations". These are referred to by many people as open standards, but technically, they are called "Recommendations".

[1] http://www.w3.org/TR/vocab-dcat/ [2] http://w3.org/ [3] http://json.org/ [4] http://www.w3.org/TR/json-ld/ [5] http://project-open-data.github.io/faq/

philipashlock commented 9 years ago

Thanks for the feedback. I started to respond with a sketch of a JSON-LD serialization in #309 since that was the issue which explained the current approach to addressing most of the goals you expressed here, e.g. using conformsTo to provide a URI that will "allow schema terms to be discovered" and using describedBy to link to "machine-readable versions of the Common Core Metadata Schema". For use within the context of the Federal Government, I think these provide simpler and more direct ways to accomplish the goals than using JSON-LD, but I understand the benefits to those interested in a broader linked data ecosystem and the importance of us aligning with existing standards. With that in mind, I don't think it's unreasonable to allow both approaches to coexist in parallel.

Also, we've had a machine readable version of the schema for a long time, but this has not yet been updated for v1.1 so I created #385 to track that along with everything else noted in #357

gbinal commented 9 years ago

It looks like we can go ahead and close #362.