openownership / data-standard

The Beneficial Ownership Data Standard (BODS) is an open standard providing a specification for modelling and publishing information on the beneficial ownership and control of corporate vehicles
http://standard.openownership.org
Other
57 stars 12 forks source link

Handling personal identification numbers, passport IDs #22

Closed mpostelnicu closed 4 years ago

mpostelnicu commented 7 years ago

Looking over the Identifier section especially when applicable to person statements, I wonder if the issuer institution is not something that can be optionally included (when available) along with the identifier. The ID itself is of little use without this information. For example even the passport IDs are not unique across the world and without the information about issuer, one cannot even assume the country of residence or nationality (a person can have dual citizenship) ...

On a more broader context about the format of these passport IDs, maybe we can draw some inspiration out of the way ICAO handles the machine readable passport ID standard:

http://www.icao.int/publications/pages/publication.aspx?docnum=9303 more specifically this document in that list http://www.icao.int/publications/Documents/9303_p4_cons_en.pdf

I realize this may be a bit too much for the purpose of this standard... i'm just sharing with you what i have found about this topic, maybe it's useful at some point later on, it's interesting to see how these unique numbers are constructed (page 35 in the above-mentioned PDF).

Thanks

timgdavies commented 7 years ago

Thanks @mpostelnicu - this is really useful.

For easy reference I've copied in a clipping from page 35 of the doc mentioned above:

image

I take from this the point that the current identifier block would either need:

(a) Creation of a codelist of schemes for each passport issuing country (e.g. the scheme for a GB passport might be 'P-GBR'

or

(b) Recommendation of using a version of the above standard to provide the passport number / details

or some mix.

timgdavies commented 7 years ago

Based on this thread, I've written up a page on identifiers with some suggested approaches here: http://beneficial-ownership-data-standard.readthedocs.io/en/latest/identifiers.html#person-identifiers

ScatteredInk commented 7 years ago

This is very helpful, thanks.

The machine-readable passport format uses the alpha-3 version of country codes. This might be useful for us in covering the edge-cases of identity because it has codes for non/supra-national issuing authorities and for natural persons without a defined nationality. Page 36-37 of ICAO 9303 - 3

And a couple of minor points:

timgdavies commented 7 years ago

Good point re: 3-digit codes, and the ordering of scheme and jurisdiction.

I've drafted an updated version of the guidance to reflect that.

On the attributedTo.identifier - good question. I had assumed that a single provenance step should be attributed to a single agent, and other agents should be introduced by chaining together provenance statements.

ScatteredInk commented 7 years ago

Yes, that's also my understanding of attributing provenance to agents. But at the moment we are allowing entities/persons to have multiple identifiers, using arrays, in all(?) contexts except attributedTo. I'm not sure which is preferable: keeping the provenance model manageable or maintaining consistency in associating identifiers with entities and persons.

timgdavies commented 7 years ago

I think identifier is functioning in a slightly different role in the two uses (entities/persons vs. attributedTo).

In the former, it is about giving you the best chance to match a statement to an entity/person - in the other it is about clarifying/disambiguating 'name', and linking the name of the party with some other identifier where such an identifier is available. Generally systems would have a single ID attached to the agent to use here I think.

timgdavies commented 7 years ago

Comment from Gert on e-mail:

What if only paper documentation is the source? You need to be able to enter that from a (manual) source system as well as a collection of others that are fragmented in nature. Getting at first preference is automated source systems, but need flexibility to have other avenues of entry and validation.

I think this also relates to #16 on provenance

timgdavies commented 7 years ago

Comment from Gert on e-mail:

As for privacy, have found the need to capture a range of personably identifiable information to uniquely identify a person, all of which needs to be protected and cannot be exposed (unless appropriately authorized). There are a lot of differing laws and penalties associate with this – as well as periodicity of storage. I would suggest the input requirements of sensitive and PII data require highly secured “storage” and there are a range of rules and controls related to the exposing of data. These statements likely obvious, but driven by use cases, not necessarily technical/theoretical requirements.

mpostelnicu commented 7 years ago

i think there could be other pieces of personal information that may not be easily shared/open to the public.

However my understanding was capturing them inside a standard does not necessarily mean publication of all material, the standard could still be beneficial as an exchange mechanism between law enforcement agency databases, so useful even if not 100% open to public.

I think there should be a ticket about permissions for publishing subsets of the data. Do we have this already? @timgdavies

timgdavies commented 6 years ago

This site provides a useful reference resource on personal identification numbers.

timgdavies commented 4 years ago

This should be picked up in the work on #131