openownership / data-standard

The Beneficial Ownership Data Standard (BODS) is an open standard providing a specification for modelling and publishing information on the beneficial ownership and control of corporate vehicles
http://standard.openownership.org
Other
57 stars 12 forks source link

Consider whether to address JSON-LD Support, RDF Vocabulary and SPARQL #121

Open timgdavies opened 5 years ago

timgdavies commented 5 years ago

I've been exploring how we can work with BODS data as a Linked Data graph.

At least at a basic level it is possible to convert BODS JSON -> JSON-LD -> RDF with the addition of a short '@context' element.

A worked example of that is available here using the @context:

{
"@context": 
    {
      "@vocab": "http://bods.openownership.org/ns/",
      "@base": "http://example.bods.openownership.org/statements/",
      "statementID": "@id",
      "statementType": "@type",
      "describedByEntityStatement": {
        "@type": "@id"
      },
      "describedByPersonStatement": {
        "@type": "@id"
      }
    }
}

Simply adding this to the top of a BODS file, and running it through a JSON-LD parser returns a pretty easy-to-work with graph, which can then be explored with SPARQL. Using SPARQL 1.1 property paths, the data is navigable.

I've got jupyter notebook exploring converting and querying the data here.

Implications

Limitations

If we want to provide an actual ontology, with properties living at the locations implied by the @context, then we would hit a problem when the current schema re-uses term with different descriptions and semantics depending on where it appears in the JSON tree.

For example, at present the schema has:

and

The @context converts both of these instances to the graph property <http://bods.openownership.org/ns/identifiers>.

If we were to provide schema information returned at http://bods.openownership.org/ns/identifiers, then there would be ambiguity over the definition of this property. The options here I think would be to:

I don't think any of this is major issue right now, but documenting for consideration.

Jeffrey04 commented 3 years ago

so is publishing the standard as RDF being considered? Would you also consider compatibility to popolo schema too?

https://www.popoloproject.com/

stevenday commented 3 years ago

@Jeffrey04 - this is not something we currently have in scope for version 1.0. Do you have a use case for it that you could elaborate on?

On Popolo, our initial data modelling looked at a wide range of data standards, including Popolo and I think our personStatement and entityStatement map across quite well to the basic fields of Popolo Persons and Organizations. The naming conventions might be slightly different, but I understand (for example) that Sinar Project's Politikus: https://sinarproject.org/transparency can produce Popolo and BODS representations of the same data quite easily.

Is there something specific that you'd like us to do more of in this regard, or should we just flag up the existing similarities better? Perhaps drop us an email on support@openownership.org if you'd like to discuss a project in more detail, this issue tracker is really just used for proposals to the standard.

Jeffrey04 commented 3 years ago

the import script that i build for sinarproject https://github.com/Jeffrey04/popit_relationship is very loosely designed to be somewhat compatible with RDF (I include enough data into the imported cache so a proper RDF graph can be generated if needed), so I thought it would be nice if this is published as RDF as I see some overlap between this and popolo

Jeffrey04 commented 3 years ago

image

This is the example graph we generated, besides popolo spec, we also used https://vocab.org/relationship/ for relationship between people

StephenAbbott commented 1 year ago

Updating this issue with work that Open Ownership did with Blue Anvil in 2020 and 2021 to model a Resource Description Framework (RDF) vocabulary for BODS ht @cosmin-marginean.

Here's a document discussing the principles and technical details of this proposal: https://docs.google.com/document/d/1vej-UkK7QtmfKrmU6aD15vceIzJDsCv1jbHCJWgn9hs

And a GitHub repository containing related tooling and code samples, as well as some SPARQL queries, which show the vocabulary working in practice.

This work was written up on the Open Ownership website: https://www.openownership.org/en/blog/an-rdf-vocabulary-for-beneficial-ownership-data-created-with-blue-anvil/

StephenAbbott commented 2 months ago

On 21 September 2023, Open Ownership announced a proof-of-concept project called BODS risk detection to demonstrate the use of BODS data in RDF format https://www.openownership.org/en/blog/spotting-risks-by-combining-beneficial-ownership-public-procurement-and-sanctions-data/

See https://github.com/openownership/bodsriskdetection