w3c-ccg / traceability-vocab

A traceability vocabulary for describing relevant Verifiable Credentials and their contents.
https://w3id.org/traceability
Other
34 stars 35 forks source link

conflation of `identifier`: use class `traceability:Identifier` #571

Closed VladimirAlexiev closed 1 month ago

VladimirAlexiev commented 2 years ago

11 schemas use a URL containing /identifier multiple times:

$ grep -cr '/identifier' .|grep -vE ':(0|1)'
./common/BindingDataRegistrationCredential.yml:2
./common/CrudeOilProduct.yml:2
./common/EntrySummary.yml:5
./common/ImmediateDelivery.yml:2
./common/Inbond.yml:6
./common/NAISMARecordLeveldentifiers.yml:3
./common/NaturalGasProduct.yml:2
./common/ppq203.yml:2
./common/SteelProduct.yml:2
./common/TransferEvent.yml:2
./common/UsdaSc6.yml:5

After #570 is fixed, this would probably mean they use the same schema.org/identifier URL multiple times. This probably means they conflate several different identifiers in one field. I can see two cases, eg:

  1. NaturalGasProduct.yml

    UWI:
    title: Unique Well Identifier
    description: Unique Well Identifier used for individual well identification.
    HSCode:
    title: HSCode
    description: Defines the Harmonized System Code for the Commodity

    This conflates two different objects (the well and the gas extracted from it) into the same RDF prop identifier.

    • That's because both are attached to the same object: NaturalGasProduct.UWI, NaturalGasProduct.HSCode.
    • This is despite having sub-objects where the identifiers can be better attached: NaturalGasProduct.facility.UWI, NaturalGasProduct.product.HSCode
    • However, it's hard to express this modeling construct: "NaturalGasProduct is a class where facility has a field UWI"
    • It's easier to express this with inheritance (another argument to fix #277)
    • "NaturalGasProduct is a subclass of Product and adds a field HSCode, and retargets field facility to OilAndGasFacility"
    • "OilAndGasFacility is a subclass of Facility (or Place) that adds field UWI"
  2. One of the worst offenders is EntrySummary that conflates 5 values to the same RDF prop identifier, eg below two identifiers of entry and of manufacturer are conflated:

    "entryNumber": "73461882610",
    "manufacturerId": "2300912",
  3. ImmediateDelivery.yml has a slightly different problem:

    "assignedIdentifier": "12345678",
    "assignedIdentifierType": "CBP",
    "entryNumber": "A123456",
    "lineItems": [
        "itemParty": {
          "assignedIdentifier": "12345678",
          "assignedIdentifierType": "CBP"
    • it separates assignedIdentifier to two different objects (ImmediateDelivery vs Party): ok
    • but it conflates entryNumber and assignedIdentifier to the same RDF prop identifier
    • Also, it fails to allow multiple assignedIdentifier. The fact that assignedIdentifierType accommodates multiple agencies suggests that it should allow multiple identifiers.

The cleanest way to solve all these cases is to use "structured identifiers", i.e. simple records that record the identifier value, but also its type ("propertyID"). And don't use specific sub-properties of identifier (which would be redundant with this identifier type). Using schema.org, this can be expressed as follows in turtle (prefixes omitted for brevity). The URLs also reflect some "URL policy" for making URLs of sub-objects rather than using blank nodes:

<naturalGasProduct/1> a :NaturalGasProduct :identifier <naturalGasProduct/1/id/1>, <naturalGasProduct/1/id/2>;
   :place <facility/1>.
<naturalGasProduct/1/id/1> a :PropertyValue; :propertyID "HSCode"; :value "80123456".
<naturalGasProduct/1/id/2> a :PropertyValue; :propertyID "GTIN"; :value "56190358290187694".

<facility/1> a :OilAndGasFacility; :identifier <facility/1/id/1>, <facility/1/id/2>.
<facility/1/id/1> a :PropertyValue; :propertyID "UWI"; :value "123456".
<facility/1/id/2> a :PropertyValue; :propertyID "GLN"; :value "56109258249087".

Better, you can define traceability:Idenifier as a subclass of :PropertyValue specialized for expressing structured identifiers. You could also record extra data such as issuer, date issued, valid until, etc. adms:Identifier has similar stuff, and we used it in the euBusinessGraph ontology.

TallTed commented 2 years ago

Some messed up MD tags (appears to be mostly misplaced and/or missing backticks; definitely in the last paragraph; quite possibly elsewhere) may decrease comprehension of what you're trying to say. I suggest viewing your opening comment on the website, and editing the MD tags.

brownoxford commented 1 year ago

Discussed on call, @mkhraisha to review.

mkhraisha commented 1 year ago

I believe the ask here is use schema.org/identifier for the identifiers used to identify the specific object and to use https://schema.org/propertyID for other identifiers for example in the NaturalGasProduct.yml we would have:

  1. identifier for HScode
  2. propertyID for UWI
mkhraisha commented 1 year ago

I didn't attend to this, will work on it soon.

nissimsan commented 1 year ago

@mkhraisha, progress on this?

There are also parts of this I should do, assigning myself as well.

VladimirAlexiev commented 8 months ago

@mkhraisha

I believe the ask here is use schema.org/identifier for the identifiers used to identify the specific object and to use https://schema.org/propertyID for other identifiers

No!

mkhraisha commented 4 months ago

Will take care of this issue soon. We should have one person clean up the credentials for their vertical:

use the Structured Value With Prefix system as laid out in #944

nissimsan commented 4 months ago

@mkhraisha on the next call can we discuss exactly what needs done on this, pls?

rhofvendahl commented 3 months ago

Reading through this I'm not clear on what a property making use of PropertyID would look like.

For example, on CBPEntrySummary:

  manufacturerId:
    title: Manufacturer Identifier
    description: [...]
    type: string
    $linkedData:
      term: manufacturerId
      '@id': https://schema.org/identifier

What would an acceptable revision be? I can dig in further, but thought I'd ask in case it's obvious to someone else in this discussion.

nissimsan commented 3 months ago

Would that be something like https://traceability.org/identifier/manufacturerId?

mkhraisha commented 3 months ago

This would create <https://w3id.org/traceability/identifier/ManufacturerID> and allow other people to reuse to create other IDs.

TallTed commented 3 months ago

This is the solution called "Structured Value With Prefix" at https://github.com/w3c-ccg/traceability-vocab/issues/944#issue-2181065847

VladimirAlexiev commented 2 months ago

@rhofvendahl your example repeats the conflation I've explained above. It takes a specific json key and maps it to a more generic RDF property. @nissimsan and @mkhraisha introducing Traceability-specific URLs for trade-related identifiers is a good idea: making a register of all such identifiers will be a great contribution. But you should also decide whether these will be straight RDF props, or thesaurus entries to be used in an Identifier class.