openownership / data-standard

The Beneficial Ownership Data Standard (BODS) is an open standard providing a specification for modelling and publishing information on the beneficial ownership and control of corporate vehicles
http://standard.openownership.org
Other
63 stars 13 forks source link

Feature: Republishing, provenance and transformation #464

Open lgs85 opened 2 years ago

lgs85 commented 2 years ago

[This ticket helps track progress towards developing a particular feature in BODS where changes or revisions to the standard may be required. It should be placed on the BODS Feature Tracker, under the relevant status column.

_See Feature development in BODS in the Handbook._

The title of this GitHub ticket should be 'Feature: XXXXX' where XXXXX is the feature name below. The information in this first post on the thread should be updated as necessary so that it holds up-to-date information. Comments on this ticket can be used to help track high-level work towards this feature or to refine this set of information.]

Feature name: Republishing, provenance and transformation

Feature background

Briefly describe the purpose of this feature

This feature ticket proposes that BODS should provide scope for representing republished data derived from multiple sources, and should enable transparency about the provenance of republished or mapped BODS data, alongside any transformation steps taken.

BODS is tailored almost exclusively towards primary beneficial ownership data, consisting of newly published or updated statements by declaring entities to state bodies. This state-level beneficial ownership data is stored and sometimes published in a register, and usually consists solely of declarations by entities registered in the state which maintains the register.

A number of organizations are already republishing beneficial ownership data from multiple sources. Most notably, the Open Ownership Register combines beneficial ownership data from the United Kingdom, Denmark, Slovakia and Ukraine, reconciles and deduplicates this with data from OpenCorporates, and republishes the reconciled data in BODS format.

The Open Ownership Register is currently being upgraded to publish in BODS v0.2, but neither v0.2 nor v0.3 enable source information from multiple sources to be represented, because the section of BODS that deals with source information is designed around the idea of a single source.

What user needs are met by introducing or developing this feature in BODS?

A key reason for developing any data standard is to enable interoperability, and with beneficial ownership data, being able easily represent reconciled and republished statements from multiple sources meets a large and distinct set of use needs. User stories include:

What impact would not meeting these needs have?

How urgent is it to meet the above needs?

At present there are relatively few public registers of beneficial ownership data and as a result enhancing and republishing beneficial ownership data in BODS format can be done using slight adaptations to the standard (e.g. using the source.description field). This slightly reduces the urgency of modifying BODS to explicitly accommodate republished data, but as more countries start publishing public beneficial ownership registers, this will become more urgent.

Are there any obvious problems, dependencies or challenges that any proposal to develop this feature would need to address?

Feature work tracking

StephenAbbott commented 1 year ago

See https://github.com/openownership/register/issues/223 for republishing issue raised by work on Open Ownership Register

StephenAbbott commented 7 months ago

Check ISO 8000-120:2016 which "specifies requirements for the representation and exchange of information about the provenance of master data that consists of characteristic data, and supplements the requirements of ISO 8000‑110"

StephenAbbott commented 6 months ago

Noting issue https://github.com/openownership/data-standard/issues/638 raised by @kathryn-ods to consider adding 'cleaning' and 'enrichment' to the motivations codelist which may be relevant to this feature development in future

kathryn-ods commented 3 weeks ago

Have been thinking about this in the context of https://standard.openownership.org/en/latest/standard/modelling/dates-guidance.html and making decisions about dates when republishing.

E.g. statementDate could be the date that the data was originally filed by the subject or it could be the date that the publisher first published it.

publicationDate could be the original publicationDate or the date of republication.

It depends on who is classed as the publisher. For the GLEIF mapping I am currently working on we are treating the dates as though GLEIF are the ones making the claim and we are the ultimate publisher. So statementDate is the date of the GLEIF delta file and publication date is our date of publication