Secondary entity classifications and entity type details block

ScatteredInk commented 3 years ago

In addition to the work on entity types in #336, I suggest a secondary classification mechanism that allows any entity to be described with one or more relevant characteristics from an entity type specific codelist, contained within a type details block.

Justification

Entity characteristics are useful for various kinds of analysis, verification processes and data quality metrics. To date BODS has taken implicit two approaches to dealing with entity characteristics:

Where a relevant intrinsic characteristic of an entity is not published in BODS users rely on external references to a canonical dataset via identifiers, e.g. using a company number to look up the legal form on the relevant company register. This approach fails where: (i) no identifiers are available (ii) no such external dataset is available, accessible or granular enough.
Inferring relational characteristics about entities from their ownership disclosures, e.g. when a declaring entity states that its interested party is exempt from disclosure and includes a description relating this to ownership by a listed company then we can infer that it is a subsidiary of a listed company, as defined by local disclosure rules. A lack of structure within entities, and difficulties linking entities to canonical datasets, makes this approach difficult, especially at scale.

I suggest that we:

(i) retain the general approach of inferring relational characteristics of entities from their ownership structures; and, (ii) allow for more explicit declarations of intrinsic characteristics using secondary classifications that will, in turn, make the inference of relational characteristics easier and allow for easier analysis of data in aggregate.

Implementation

An outline of this approach is sketched below for feedback. More detailed work would need to be broken out into separate issues.

If we adopt this approach then I think we may want to restructure our entityStatement somewhat, so that we have the relevant details about an entity separated from data about the disclosure. This would move us towards a model where we have different data structure depending on the type of data that is being modelled, e.g.,

{
  "entityType": "arrangement",
  "arrangementDetails": {...}
}

Within a details block we could then have an arrangementType field (and so on for all entity types) and additional fields that are specific to each type of entity. I would also add a new common field that I think is necessary for this kind of classification:

localEntityType - this would be the local name (preferably drawn from a codelist) that maps to the choice of entity type. This would allow analysis of the use of particular legal formations for particular purposes.

A very draft outline of how this might look for each entity type is set out below:

State entities

No secondary classification (although this may depend on the scope of a 'state entity').

Additional fields: jurisdiction

State bodies

stateBodyType: state entity with legal personality, state entity without legal personality

Additional fields: formedByStatute - the legislation used to create a

Usage notes: registered entities controlled by state bodies should be recorded as legal entities with type registered entity.

Legal entities

Legal entity type: registered legal entity, unregistered legal entity

Arrangements

Arrangement type: trust, general partnership, unincorporated organisation

Additional fields: arangementDescription - a description of the arrangement if available.

Agreements

Agreements: joint shareholding, nominee shareholding, bearer share, contractual joint venture, contractual agreement, informal agreement, unknown agreement

Additional fields: agreementDescription - a description of the agreement if available.

Unknown entity type: unknown legal entity, unknown arrangement, unknown agreement, unknown state, unknown state entity, unknown type, unknown structure

Usage notes: This is connected to the ongoing work on missing information. The addition of unknown structure would allow us to distinguish between a single unknown entity (whether of a known or unknown type) and cases where an unknown node represents an ownership structure of unknown size.

Classifications that imply structure

Some potential classifications (like 'state owned enterprise' or 'subsidiary-of-listed-company') imply a certain structure, as below (and other examples on the same board):

BODS whiteboard (1)

Where that structure is known (including where an unknown entity can be used to represent an uncertainty in the structure) then I think we want to publish a full BODS statement rather than use a secondary classification. Inferences along the lines of "this company is a state-owned enterprise" can then be the responsibility of data consumers, and we can develop guidance on queries to find entities of interest if this proves difficult.

kd-ods commented 3 years ago

This looks viable, @ScatteredInk... a few thoughts/questions:

1) On state entity and state body types:

state entity will need to cover (I think) states in federal systems, the federal authority itself, regions (possibly), nation-states, countries etc... I don't think we necessarily need a sub-classification here, we just need to phrase the description carefully so that it's clear what the scope is and how this field differs from 'state body'. (For example, in a USA context, we wouldn't want people to think that this field could only name a U.S. state.) I'm at risk of ruminating on sovereignty and jurisprudence here. We definitely want to avoid philosophical quagmires ;-)
I think state, rather than state entity, might better distinguish this type from state body

2) On localEntityType

Will this be scoped by Incorporated In Jurisdiction (or the equivalent for arrangements and agreements)? I would imagine so. It will matter when it comes to publishing details of non-domestic entities.
On how to structure this data along with the secondary classification (and how to name the fields), would something like this work better than having two separate fields at the top level:

{
  "entityType": "arrangement",
  "arrangementDetails": {
      "form*": {
          "general": "trust",
          "local": "fideicomiso"
      {
  }
}

/* I wonder whether 'form' is a better term here since - in English at least - it maps to 'legal form' quite nicely.

3) On Unknown entity type

Having this type is useful for the reason you say, ie providing the subclassification of unknown type and unknown structure. But I think that we'll still need the unknown legal entity, unknown arrangement, etc. as secondary classifications under their respective details block, since there may be limited type-specific details that are known. For example, a declaration process may uncover the local and general legal form of an entity, but not its name. This is the case with PLC parent companies of declaring entities in Latvia.
We're missing anonymous secondary classifications for all the types atm.

ScatteredInk commented 3 years ago

Yes, agreed on state as the better term.
Very good point - yes I think it should be, and therefore your structure makes sense.
Yes, I think that's also right. I was a bit unsure what the best way to go with this was but the Latvian example suggests that keeping unknown options within each entity type makes sense.

siwhitehouse commented 3 years ago

Some comments/questions:

My reading of the example given in 1

using a company number to look up the legal form on the relevant company register.

is that it can be met by using the localEntityType. This looks to be a reasonable approach to me, and one that we should spin off into a separate issue to explore in more detail. I expect that, where a register exists, using it to look up a company's legal form could act as a check. Some BODS consumers/users might prefer to do this and then fall back on the localEntityType when it isn't possible (as in the example). I like that. I've a wavy-hand comment that there's quite a lot of research we'll need to do to decide if we can create a codelist for this type.

I'm not convinced that we should restructure entityStatement as suggested.

My first reaction was to wonder how this affects producers and consumers of the data. It feels like we are pushing a significant amount of work on publishers in order to benefit consumers of BODS here.
Another thought was that we are suggesting a change to the list of entity types here, and that pushing specific fields into those entity types could bind us to them more than if they were top level fields. I have two, interlinked, questions:
- will it make it more difficult for us to make changes in the future?
- what are the implications for future backwards compatibility?

I don't know the answers, and they might be easily resolved, but I think we should consider them.

I think we should do some research and exploration into how using different data structures would work if an entity moved between entity types. The case that immediately came to mind was of privatisation or nationalisation, where (I think) an entity moves between being a state entity to a legal entity or vice versa.
Some comments about some of the additional fields being suggested: i. In state bodies we have formedByStatute which we might find to be useful in other entity types based on research done in 3. above. ii. In state entities there is a jurisdiction field. How is this distinguishable from the incorporatedinJurisdiction field in entityStatement? If a state entity would always be expected to put the same data in both fields then could we consider renaming incorporatedinJurisdiction instead (which might already be overly specific for "unregistered legal entity")? iii. Instead of having specific arrangementDescription and agreementDescription we could have a top level description field in EntityStatement.

Some of the above is based on pushing back against restructuring entityStatement.

What are the reasons for preferring entity type specific codelists over one codelist? I think it would be useful for us to outline the pros and cons of each option.
At the moment state, state entity, state body and state owned enterprise are not in our (unpublished) glossary. I think we should define them (I spent quite a time going round in circles before @ScatteredInk helped clarify what we mean by them). This also highlights a benefit of publishing the glossary in the repository.

kd-ods commented 3 years ago

we may want to restructure our entityStatement somewhat, so that we have the relevant details about an entity separated from data about the disclosure.

I think Simon's right that the options need to be fleshed out and considered carefully. Point 3 (that pushing all top level fields into a details block might bind the structure too tightly to the entity types) is - for me - the most trenchant. (I'm not sure that 2 holds, and on 4: I think the broad entityTypes are/should be so distinct categorically that entities don't move between them. On the particular, SOE issue: an SOE's SOE-ness will be represented by it (as a legal entity) being owned by a state (or state body) so if it's privatised it will still be the same legal entity, but owned and controlled differently.)

Also, if we're considering restructuring statements we should be aware of this proposal to separate statement contents into a payload and a wrapper. Not that progress need be blocked by wider considerations, but it might help to constrain our immediate ambitions.

Separately, on localEntityType:

there's quite a lot of research we'll need to do to decide if we can create a codelist for this type.

I don't think it would be down to us to create any codelist. We'd need to put in place a way of publishers referring to locally-defined legal forms. This could be an OCDS local-scheme-style system, or - in the first place - encouragement to include information in a publication policy.

ScatteredInk commented 3 years ago

Thanks @siwhitehouse and @kd-ods.

I think the arguments for avoiding a major restructure of an entity statement are sound - let's not do this.

That then suggests paring this back to an MVP proposal of:

Add the expanded list of entity types so that we can model SOEs and get the : state, stateBody, legalEntity, arrangement, agreement, unknown, anonymous.
Add in closed codelists for entity type classifications where this has immediate benefits for publishing and analysis: I think this would be arrangements, agreements and unknown entity types.
Rename registeredInJurisdiction so that this works for all entity types.
Add a way to describe local entity types. Per @kd-ods's suggestion, I wonder if we can do this via classifications and guidance rather than a specific field - this would have the advantage of allowing BODS to work in other contexts where companies are being classified, e.g. certifications for procurement purposes.
Add specific fields for SOEs.

If that sounds like a way forward, then we can do 1 (and maybe 2) here and work on the rest in the relevant tickets.

kd-ods commented 3 years ago

That sounds like a good plan, @ScatteredInk. Couple of questions/thoughts:

On 2: So you don't think we need to add 'registered legal entity' and 'unregistered legal entity' classifications for legalEntities?

On 4: Do you mean that we don't offer a field for a local name but, e.g., we encourage publishers to document which local forms are included under which of BODS' classifcations?

On another issue entirely (which I should have raised earlier, sorry to raise it now, hope it's not a spanner, etc.): Is the major conceptual difference between an arrangement and an agreement, just that the former are well-defined within FATF? If so, fair enough. Though at first glance the terms don't appear very semantically distinct. Is there any qualifier with which we could prefix 'arrangement' that would make the distinction more evident? (formalArrangement, recognisedArrangement, regulatedArrangement etc.)

ScatteredInk commented 3 years ago

On 2: So you don't think we need to add 'registered legal entity' and 'unregistered legal entity' classifications for legalEntities?

I was in two minds about this but it may be better to have this classification, as a pointer for which identifiers are expected.

On 4: Do you mean that we don't offer a field for a local name but, e.g., we encourage publishers to document which local forms are included under which of BODS' classifcations?

Yes. Given that local forms are likely to change over time, I think we want to encourage a publication pattern where this is maintained as an external reference source that BODS statements can be checked against, as with the UK's list.

Is the major conceptual difference between an arrangement and an agreement, just that the former are well-defined within FATF?

There are two differences that I see between arrangements and agreements:

There are specific regulatory requirements for collecting and disclosing the beneficial owners of legal arrangements, whereas agreements are mechanisms through which beneficial ownership is exercised (or shared).
An arrangement has an existence independent of the control relationship that any BODS statement describes, i.e., we can expect the interested parties of BODS statements that involve arrangements to change during the lifetime of an arrangement, but the arrangement will still exist. An agreement does not have the same independent existence: it will end if the relationship between the interested parties ends.

The second one may be slightly artificial, legally, but is I think more representative of the de facto position.

kd-ods commented 3 years ago

Thanks, @ScatteredInk - that's really helpful. Any wording and detail that we can provide to help illuminate that agreement/arrangement distinction is going to be critical.

kd-ods commented 3 years ago

@ScatteredInk - I was talking to @kindly today about putting the publicListing block into the the entity statement schema (#326). I think we've come to the point where that work intersects with this. Essentially the question we have is: in the case where a publisher knows that an entity is publicly listed but doesn't collect or have information about the listings, how is that represented in the schema?

I had thought that inclusion of the publicListing key referring to an empty object could be used to convey that information, but on reflection with @kindly, realise that that technique will not be reliably used by publishers. We need a more robust way to convey that information. Here are two options, but there are poss more:

1) Revive and revise the idea of a secondary classification codelist for 'legalEntity'. And have it include 'publicly listed entity'. (There are questions and implications of this for the other items that might appear in the classification, around mutual exclusivity. Also, this option would effectively make the publicListing block a particular implementation of the more general details block that you were originally thinking about, @ScatteredInk; so we might want to adjust our publicListing block so that it would fit with a more generalisable system for other types of entities' details?)

2) Replicate the model we used for PEP info on the person statement (where a hasPepStatus boolean sits next to a pepStatusDetails block). So we would have a hasPublicListing boolean alongside the publicListing block.

What do you reckon? (2) feels a bit too 'special-casey' to me. We prob need to go for something more generalisable, like (1)?

ScatteredInk commented 3 years ago

We discussed this today and think the best option is:

Have a hasPublicListing field inside the publicListing block, with appropriate validation, and move the equivalent boolean for PEPs to match.

kd-ods commented 2 years ago

Thinking on this ticket and on #336 informed development for BODS 0.3. We now have a structure in place in the schema for entity types and entity subtypes. At the moment, the only entity type with a subtype categories is 'statebody'. (You can see it in use here.)

kd-ods commented 2 years ago

There is an ISO standard (20275) for Entity Legal Forms. This list will be useful when considering subtypes of entities and how general and local categorisation might work. (It might be overkill to build ISO 20275 code usage into the BODS schema, but users' use of ISO 20275 values and our use of ISO 20275 terminology should be considered.)

kathryn-ods commented 5 months ago

Does the implementation of entityType with type, subtype and details resolved this ticket?

kd-ods commented 5 months ago

Yes - I think this can be closed and we'll use #336 for future work on entity classification.

openownership / data-standard