project-open-data / project-open-data.github.io

Open Data Policy — Managing Information as an Asset
https://project-open-data.cio.gov/
Other
1.33k stars 585 forks source link

Define JSON-LD Context and use of associated JSON-LD keywords #388

Open philipashlock opened 9 years ago

philipashlock commented 9 years ago

This issue was originally discussed as part of #309 which was about defining the version of the schema being used. Since Project Open Data has relied on JSON Schema to define the schema in a machine readable way, it made sense to use URLs and resources associated with our JSON Schema definition to address this issue. While I think that is the simplest and most direct approach to specify the schema among those involved with Project Open Data, it's quite specific to our use and doesn't leverage existing definitions used by others - like those working with DCAT. Furthermore, it doesn't make it explicit that we are actually using the same properties and definitions used in DCAT. With JSON-LD, we can not only make it explicit that the schema is the JSON serialization of DCAT, but it should make interoperability with DCAT and associated vocabularies functionally possible rather than just notional.

In other words, it seems like there's value in both approaches, so I've broken this off as a separate issue from #309

Even while recognizing the long discussions about linked data that have occurred here as well as many ongoing discussions about the state of linked data in the broader community, it should be easy to address this in a straightforward manner. Because of the work that's already been done to align the schema with DCAT and the fact that JSON-LD requires a minimal amount of overhead with something modeled on established vocabularies, this is actually pretty trivial and shouldn't add much complexity to the existing schema and documentation.

That said, it is more to add. Since the additional fields to support JSON-LD can simply be appended to the existing schema and since we don't yet see a huge demand or critical need to support JSON-LD for current use cases, I don't think JSON-LD support should be a hard requirement as part of this revision to the schema. We can certainly still document all the necessary JSON-LD keywords here as optional fields and explain why they're useful. Tools provided by Data.gov can also automatically generate JSON-LD versions of the metadata which will guarantee that many agencies will be serving JSON-LD versions of the data.json file. Every six months we reassess these requirements, so if we see more interest and more value demonstrated by requiring JSON-LD we can consider it a strict requirement in the future. Please also note that Data.gov already aggregates all these data.json files and makes the metadata available using DCAT as RDF XML as well as Schema.org microdata.

There were already many comments provided in #309 to draft what was needed to make the schema work as JSON-LD and we can continue to revise those drafts here. I'll start with the last example provided by @philarcher1 as well as the more compact version of that. @amercader is also tracking this for more general CKAN implementations with https://github.com/ckan/ckanext-dcat/issues/20

From @philarcher1

{
  "@context": {
    "dcat": "http://www.w3.org/ns/dcat#",
    "org": "http://www.w3.org/ns/org#",
    "vcard": "http://www.w3.org/2006/vcard/ns#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "@vocab": "http://www.w3.org/ns/dcat#",
    "dc": "http://purl.org/dc/terms/",
    "pod": "https://project-open-data.cio.gov/v1.1/schema#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "describedBy": {
      "@id": "http://www.w3.org/2007/05/powder#describedby",
      "@type": "@id"
    },
    "downloadURL": {
      "@id": "dcat:downloadURL",
      "@type": "@id"
    },
    "accessURL": {
      "@id": "dcat:acessURL",
      "@type": "@id"
    },
    "title": "dc:title",
    "description": "dc:description",
    "issued": {
      "@id": "dc:issued",
      "@type": "http://www.w3.org/2001/XMLSchema#date"
    },
    "modified": {
      "@id": "dc:modified",
      "@type": "http://www.w3.org/2001/XMLSchema#date"
    },
    "language": "dc:language",
    "license": "dc:license",
    "rights": "dc:rights",
    "spatial": "dc:spatial",
    "conformsTo": {
      "@id": "dc:conformsTo",
      "@type": "@id"
    },
    "publisher": "dc:publisher",
    "identifier": "dc:identifier",
    "temporal": "dc:temporal",
    "format": "dc:format",
    "accrualPeriodicity": "dc:accrualPeriodicity",
    "homepage": "foaf:homepage",
    "accessLevel": "pod:accessLevel",
    "bureauCode": "pod:bureauCode",
    "dataQuality": "pod:dataQuality",
    "describedByType": "pod:describedByType",
    "primaryITInvestmentUII": "pod:primaryITInvestmentUII",
    "programCode": "pod:programCode",
    "fn": "vcard:fn",
    "hasEmail": "vcard:email",
    "name": "skos:prefLabel",
    "subOrganizationOf": "org:subOrganizationOf"
  },
  "@id": "http://www.agency.gov/data.json",
  "@type": "dcat:Catalog",
  "conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
  "describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
  "dataset": [
    {
      "@type": "dcat:Dataset",
      "accessLevel": "public",
      "accrualPeriodicity": "R/P1Y",
      "bureauCode": [
        "018:10"
      ],
      "conformsTo": "http://www.agency.gov/widget-taxonomy/",
      "contactPoint": {
        "@type": "vcard:Contact",
        "fn": "Jane Doe",
        "hasEmail": "mailto:jane.doe@agency.gov"
      },
      "describedBy": "http://www.agency.gov/datasets/widgets-dictionary.html",
      "dataQuality": true,
      "description": "This dataset provides national statistics on the production of widgets",
      "distribution": [
        {
          "@type": "dcat:Distribution",
          "description": "Widgets data as a CSV file",
          "downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets.csv",
          "format": "CSV",
          "mediaType": "text/csv",
          "title": "widgets.csv"
        },
        {
          "@type": "dcat:Distribution",
          "description": "Widgets data as a zipped CSV file with attached data dictionary",
          "downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets-all.zip",
          "format": "Zipped CSV",
          "mediaType": "application/zip",
          "title": "widgets-all.zip"
        },
        {
          "@type": "dcat:Distribution",
          "conformsTo": "http://www.agency.gov/widget-data-standard/",
          "describedBy": "http://www.agency.gov/widgets/schema.json",
          "describedByType": "application/schema+json",
          "description": "Widget data as a JSON feed",
          "downloadURL": "http://www.agency.gov/feeds/widgets-all.json",
          "format": "JSON",
          "mediaType": "application/json",
          "title": "widgets-all.json"
        },
        {
          "@type": "dcat:Distribution",
          "accessURL": "https://data.agency.gov/api/widgets-statistics/",
          "description": "A fully queryable REST API with JSON and XML output",
          "format": "API",
          "title": "Widgets REST API"
        }
      ],
      "identifier": "widgets-0001",
      "issued": "2011-11-22",
      "keyword": [
        "widget",
        "manufacturing",
        "factory"
      ],
      "landingPage": "http://agency.gov/widgets/data",
      "language": [
        "en-US"
      ],
      "license": "http://creativecommons.org/publicdomain/zero/1.0/",
      "modified": "2011-11-19T12:00:00Z",
      "primaryITInvestmentUII": "021-006227212",
      "programCode": [
        "018:001"
      ],
      "publisher": {
        "@type": "org:Organization",
        "name": "Widget Services",
        "subOrganizationOf": {
          "@type": "org:Organization",
          "name": "Office of Citizen Services and Innovative Technologies",
          "subOrganizationOf": {
            "@type": "org:Organization",
            "name": "General Services Administration",
            "subOrganizationOf": {
              "@type": "org:Organization",
              "name": "U.S. Government"
            }
          }
        }
      },
      "references": [
        "http://agency.gov/docs/widgets-1.html",
        "http://agency.gov/docs/widgets-2.html"
      ],
      "rights": "This dataset has been given an international public domain dedication for worldwide reuse",
      "spatial": "United States",
      "systemOfRecords": "http://www.agency.gov/widgets/sorn/",
      "temporal": "2009-09-01T12:00:00Z/2010-05-31T12:00:00Z",
      "theme": [
        "manufacturing"
      ],
      "title": "U.S. Widget Manufacturing Statistics"
    }
  ]
}

And then the more compact version where we put the body of @context in a .jsonld file hosted on the Project Open Data site. This is to demonstrate that all of @context can be condensed into a URL, but the use of the @type keywords would still be the same throughout each dataset

{
  "@context": "https://project-open-data.cio.gov/v1.1/schema/data.jsonld",
  "@id": "http://www.agency.gov/data.json",
  "@type": "dcat:Catalog",
  "conformsTo": "https://project-open-data.cio.gov/v1.1/schema",
  "describedBy": "https://project-open-data.cio.gov/v1.1/schema/catalog.json",
  "dataset": [
    {...}
  ]
}
azaroth42 commented 9 years ago

+1 to making progress in this area

philipashlock commented 9 years ago

Thanks! All the JSON-LD keywords have been incorporated into the v1.1 branch and draft documentation, but I've indicated the contents of @context are still a draft. We can continue to work on refining it here

gbinal commented 9 years ago

:+1:

Having the option for agencies to also offer a json-ld version is great.

Thanks for getting this up, Phil.

akuckartz commented 9 years ago

Although this is not (and should not be) a popularity contest: :+1: for JSON-LD.