openownership / lib-cove-bods

Check that your data complies with the Beneficial Ownership Data Standard (BODS) using our install our data review library to analyse files via your command line interface
https://datareview.openownership.org/
Other
1 stars 0 forks source link

Additional checks: statement series #114

Closed kd-ods closed 1 month ago

kd-ods commented 3 months ago

For our purposes, a statement series is a set of statement within a dataset which share a recordId value. All the following Checks should be run.

Specification for Check(s):

Check 1: That in each statement series, at most one statement has recordStatus 'new'.

On fail:

Error message: A record has multiple Statements with record status 'new'. Record status can only be ‘new’ the first time a Statement is published for a record. Info message: Record identifier: [VALUE]

Check 2: In each statement series, the statement with recordStatus 'new', if it exists, must have the earliest statementDate.

On fail:

Error message: Statement with recordStatus 'updated' or 'closed' cannot have statementDate earlier than corresponding statement with recordStatus 'new' Info message: Record identifier: [VALUE], Statement identifier: [VALUE] Check 3: That in each statement series, no more than one statement has recordStatus 'closed'.

On fail:

Error message: A record has multiple Statements with record status 'closed.' Record status can be ‘closed’ only the final time a Statement is published for a record.

Info message: Record identifier: [VALUE] Check 4: In each statement series, the statement with recordStatus 'closed' must have the latest statementDate.

On fail:

Error message: Statement with recordStatus 'new' or 'updated' cannot have statementDate later than corresponding statement with recordStatus 'closed' Info message: Record identifier: [VALUE], Statement identifier: [VALUE] Check 5: That in each statement series, the recordType value is the same across all statements.

On fail:

Error message: Statements relating to the same record must all have the same record type. Info message: Record identifier: [VALUE]

kathryn-ods commented 3 months ago

from the spec "recordStatus MUST be ‘new’ only the first time a Statement is published for a recordID" and "recordStatus MUST be ‘closed’ only the final time a Statement is published for a recordID" I think we should also be checking that "new" is only used for the statement with the earliest statementDate and "closed" only used for the statement with the latest statement date. I'm going to edit @kd-ods ticket to add these checks

Should we also require that every statement set has at least one statement with recordStatus 'new'? Haven't added this in yet

kathryn-ods commented 3 months ago

Going to write my edits in this comment in case kadie disagrees with these additional checks

For our purposes, a statement series is a set of statement within a dataset which share a recordId value. All the following Checks should be run.

Specification for Check(s):

Check 1:

That in each statement series, one statement has recordStatus 'new'.

On fail:

Check 2:

In each statement series, the statement with recordStatus 'new' must have the earliest statementDate.

On fail:

Check 3:

That in each statement series, no more than one statement has recordStatus 'closed'.

On fail:

Check 4:

In each statement series, the statement with recordStatus 'closed' must have the latest statementDate.

On fail:

Check 5:

That in each statement series, the recordType value is the same across all statements.

On fail:

kd-ods commented 3 months ago

I think we should also be checking that "new" is only used for the statement with the earliest statementDate and "closed" only used for the statement with the latest statement date.

Yes - we can check for this too.

Should we also require that every statement set has at least one statement with recordStatus 'new'? Haven't added this in yet

recordStatus isn't required, so we can't check this.

kathryn-ods commented 3 months ago

recordStatus isn't required, so we can't check this.

Is there a reason we don't require this? It seems like a key field to me

kd-ods commented 3 months ago

The minimal requirements here for updating BO info over time don't require use of recordStatus. It allows for systems that have very simple record management (e.g. if they don't resolve most entities' identities over time).

kathryn-ods commented 3 months ago

The minimal requirements here for updating BO info over time don't require use of recordStatus

Are there any implications of this when using the BODS visualiser? @codemacabre might need to factor this in

kathryn-ods commented 3 months ago

Ok - final test set

Check 1: That in each statement series, at most one statement has recordStatus 'new'.

On fail:

Error message: A record has multiple Statements with record status 'new'. Record status can only be ‘new’ the first time a Statement is published for a record. Info message: Record identifier: [VALUE]

Check 2: In each statement series, the statement with recordStatus 'new', if it exists, must have the earliest statementDate.

On fail:

Error message: Statement with recordStatus 'updated' or 'closed' cannot have statementDate earlier than corresponding statement with recordStatus 'new' Info message: Record identifier: [VALUE], Statement identifier: [VALUE] Check 3: That in each statement series, no more than one statement has recordStatus 'closed'.

On fail:

Error message: A record has multiple Statements with record status 'closed.' Record status can be ‘closed’ only the final time a Statement is published for a record.

Info message: Record identifier: [VALUE] Check 4: In each statement series, the statement with recordStatus 'closed' must have the latest statementDate.

On fail:

Error message: Statement with recordStatus 'new' or 'updated' cannot have statementDate later than corresponding statement with recordStatus 'closed' Info message: Record identifier: [VALUE], Statement identifier: [VALUE] Check 5: That in each statement series, the recordType value is the same across all statements.

On fail:

Error message: Statements relating to the same record must all have the same record type. Info message: Record identifier: [VALUE]

kathryn-ods commented 3 months ago

@kd-ods happy with those? Have edited the tests and error messages to reflect that recordStatus is an optional field

kd-ods commented 3 months ago

happy with those?

Yes - they look good.

Are there any implications of this when using the BODS visualiser?

Yes and no! The visualiser has to be pretty flexible when it comes to handling BODS data. (It doesn't even require that the BODS is valid.) The fact that recordStatus might not exist in a dataset will be taken into account when specifying this work on the Visualiser.

kathryn-ods commented 3 months ago

Thanks @kd-ods have hidden some of the comments and edited the original spec of the tests to make this ticket more readable. Will sort the test data out now

kathryn-ods commented 3 months ago

Valid data 1 - correct use of recordStatus

[
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-04-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "closed",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  }
]

Valid data 2 - no record status

[
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-04-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  }
]

Invalid data 1 - multiple statements are 'new'

[
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-04-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "closed",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  }
]

Invalid data 2 - updated before new

[
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-04-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "closed",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2022-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  }
]

Invalid data 3 - multiple closed

[
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-04-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "closed",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "closed",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  }
]

Invalid 4 - closed before updated

[
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-04-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-04-01",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordStatus": "closed",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "new",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "10478c6cf6de",
    "recordType": "person",
    "recordStatus": "updated",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  }
]

Invalid 5 - mixed entity type

[
  {
    "statementId": "1dc0e987-5c57-4a1c-b3ad-61353b66a9b7",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2020-03-04",
    "recordId": "c359f58d2977",
    "recordType": "entity",
    "recordDetails": {
      "isComponent": false,
      "entityType": {
        "type": "registeredEntity"
      }
    }
  },
  {
    "statementId": "019a93f1-e470-42e9-957b-03559861b2e2",
    "declarationSubject": "c359f58d2977",
    "statementDate": "2021-03-04",
    "recordId": "c359f58d2977",
    "recordType": "person",
    "recordDetails": {
      "isComponent": false,
      "personType": "knownPerson"
    }
  }
]
radix0000 commented 2 months ago

@kathryn-ods @kd-ods Some duplicate statementIds (e.g. in Valid data 1) which I assume was unintentional, will just fix unless I hear otherwise.