openownership / register

A demonstration transnational register of beneficial ownership data from the UK, Denmark, Slovakia and Armenia
https://register.openownership.org
GNU Affero General Public License v3.0
18 stars 3 forks source link

BODS identifiers scheme vs schemeName #224

Open tiredpixel opened 1 year ago

tiredpixel commented 1 year ago

BODS 0.2 and BODS 0.3 specify that any of id, scheme, schemeName, uri are required. This is unfortunate, since it's possible for identifiers to have schemeName but no scheme, leading to lots of matching text strings, rather than a persistent identifier.

e.g. LEIs have scheme XI-LEI, and schemeName Global Legal Entity Identifier Index. This makes it possible to match on scheme = 'XI-LEI.

e.g. OpenCorporates identifiers have schemeName OpenCorporates, but no scheme. This means it's necessary to always match on the name itself.

e.g. OpenOwnership Register identifiers have schemeName OpenOwnership Register, but no scheme. This means it's necessary to always match on the name itself.

However, even for LEIs, schemeName is still stored within the statement itself, meaning simply rewording the name Global Legal Entity Identifier Index (e.g. to LEI or GLEIF Identifier, etc.) would cause a cascade of having to republish every single statement, since the hash of the statement and therefore the statementID would change (correct behaviour considering that schemeName is considered to be within the statement).

This commit shows some of the text matching going on in Register, made necessary by this specification: https://github.com/openownership/register-sources-bods/commit/9df343855d103ac472e9e9c59642580d5ef489fc

I propose one of two changes to BODS:

  1. Make scheme required, leaving schemeName optional.
  2. Remove schemeName entirely, making scheme required.

From my current understanding, I would prefer 2, since it would avoid a statement republishing storm.

In addition, I propose that:

  1. Make either id or uri required (as in a choice in the specification, not using 'any' language that currently exists).

I don't currently have an opinion on which of id or uri would be better.

NOTE: This ticket is almost certainly filed in the wrong place—but I'm not sure where the right place is, given multiple repositories and different ticket groupings splitting BODS specification, proposals, implementations, and such. Please feel free to move it to the correct place. Thank you!

StephenAbbott commented 1 year ago

Thanks @tiredpixel. I've noted these suggestions and stored them alongside others - see discussion here for example - relating to required fields in BODS which we'll revisit at a later stage (not in version 0.4).