w3c-ccg / traceability-vocab

A traceability vocabulary for describing relevant Verifiable Credentials and their contents.
https://w3id.org/traceability
Other
35 stars 35 forks source link

conflation of `identifier`: use class `traceability:Identifier` #571

Open VladimirAlexiev opened 1 year ago

VladimirAlexiev commented 1 year ago

11 schemas use a URL containing /identifier multiple times:

$ grep -cr '/identifier' .|grep -vE ':(0|1)'
./common/BindingDataRegistrationCredential.yml:2
./common/CrudeOilProduct.yml:2
./common/EntrySummary.yml:5
./common/ImmediateDelivery.yml:2
./common/Inbond.yml:6
./common/NAISMARecordLeveldentifiers.yml:3
./common/NaturalGasProduct.yml:2
./common/ppq203.yml:2
./common/SteelProduct.yml:2
./common/TransferEvent.yml:2
./common/UsdaSc6.yml:5

After #570 is fixed, this would probably mean they use the same schema.org/identifier URL multiple times. This probably means they conflate several different identifiers in one field. I can see two cases, eg:

  1. NaturalGasProduct.yml

    UWI:
    title: Unique Well Identifier
    description: Unique Well Identifier used for individual well identification.
    HSCode:
    title: HSCode
    description: Defines the Harmonized System Code for the Commodity

    This conflates two different objects (the well and the gas extracted from it) into the same RDF prop identifier.

    • That's because both are attached to the same object: NaturalGasProduct.UWI, NaturalGasProduct.HSCode.
    • This is despite having sub-objects where the identifiers can be better attached: NaturalGasProduct.facility.UWI, NaturalGasProduct.product.HSCode
    • However, it's hard to express this modeling construct: "NaturalGasProduct is a class where facility has a field UWI"
    • It's easier to express this with inheritance (another argument to fix #277)
    • "NaturalGasProduct is a subclass of Product and adds a field HSCode, and retargets field facility to OilAndGasFacility"
    • "OilAndGasFacility is a subclass of Facility (or Place) that adds field UWI"
  2. One of the worst offenders is EntrySummary that conflates 5 values to the same RDF prop identifier, eg below two identifiers of entry and of manufacturer are conflated:

    "entryNumber": "73461882610",
    "manufacturerId": "2300912",
  3. ImmediateDelivery.yml has a slightly different problem:

    "assignedIdentifier": "12345678",
    "assignedIdentifierType": "CBP",
    "entryNumber": "A123456",
    "lineItems": [
        "itemParty": {
          "assignedIdentifier": "12345678",
          "assignedIdentifierType": "CBP"
    • it separates assignedIdentifier to two different objects (ImmediateDelivery vs Party): ok
    • but it conflates entryNumber and assignedIdentifier to the same RDF prop identifier
    • Also, it fails to allow multiple assignedIdentifier. The fact that assignedIdentifierType accommodates multiple agencies suggests that it should allow multiple identifiers.

The cleanest way to solve all these cases is to use "structured identifiers", i.e. simple records that record the identifier value, but also its type ("propertyID"). And don't use specific sub-properties of identifier (which would be redundant with this identifier type). Using schema.org, this can be expressed as follows in turtle (prefixes omitted for brevity). The URLs also reflect some "URL policy" for making URLs of sub-objects rather than using blank nodes:

<naturalGasProduct/1> a :NaturalGasProduct :identifier <naturalGasProduct/1/id/1>, <naturalGasProduct/1/id/2>;
   :place <facility/1>.
<naturalGasProduct/1/id/1> a :PropertyValue; :propertyID "HSCode"; :value "80123456".
<naturalGasProduct/1/id/2> a :PropertyValue; :propertyID "GTIN"; :value "56190358290187694".

<facility/1> a :OilAndGasFacility; :identifier <facility/1/id/1>, <facility/1/id/2>.
<facility/1/id/1> a :PropertyValue; :propertyID "UWI"; :value "123456".
<facility/1/id/2> a :PropertyValue; :propertyID "GLN"; :value "56109258249087".

Better, you can define traceability:Idenifier as a subclass of :PropertyValue specialized for expressing structured identifiers. You could also record extra data such as issuer, date issued, valid until, etc. adms:Identifier has similar stuff, and we used it in the euBusinessGraph ontology.

TallTed commented 1 year ago

Some messed up MD tags (appears to be mostly misplaced and/or missing backticks; definitely in the last paragraph; quite possibly elsewhere) may decrease comprehension of what you're trying to say. I suggest viewing your opening comment on the website, and editing the MD tags.

brownoxford commented 1 year ago

Discussed on call, @mkhraisha to review.

mkhraisha commented 1 year ago

I believe the ask here is use schema.org/identifier for the identifiers used to identify the specific object and to use https://schema.org/propertyID for other identifiers for example in the NaturalGasProduct.yml we would have:

  1. identifier for HScode
  2. propertyID for UWI
mkhraisha commented 11 months ago

I didn't attend to this, will work on it soon.

nissimsan commented 8 months ago

@mkhraisha, progress on this?

There are also parts of this I should do, assigning myself as well.

VladimirAlexiev commented 3 months ago

@mkhraisha

I believe the ask here is use schema.org/identifier for the identifiers used to identify the specific object and to use https://schema.org/propertyID for other identifiers

No!