w3c-ccg / traceability-vocab

A traceability vocabulary for describing relevant Verifiable Credentials and their contents.
https://w3id.org/traceability
Other
35 stars 35 forks source link

wrong $linkedData mapping: props vs classes #270

Open VladimirAlexiev opened 2 years ago

VladimirAlexiev commented 2 years ago

There are very many cases where a prop is mapped to class, eg https://github.com/w3c-ccg/traceability-vocab/blob/main/docs/openapi/components/schemas/common/BillOfLading.yml#L38:

    $linkedData:
      term: relatedDocuments
      '@id': https://schema.org/Purchase

https://github.com/w3c-ccg/traceability-vocab/blob/main/docs/openapi/components/schemas/common/BillOfLading.yml#L107

    $linkedData:
      term: freight
      '@id': https://schema.org/ParcelDelivery

This is wrong in all cases. Basically, each time you map a lowercase (prop) to a schema uppercase term (class), it's a likely mistake.

Similarly, some props have been mapped to a schema datatype (I think DateTime).

VladimirAlexiev commented 2 years ago

@OR13 @nissimsan I used the context to quickly catch bugs of this sort. Here's just a sampling. Do you disagree with some of them?

    "relatedLink": {
      "@id": "https://w3id.org/traceability#LinkRole" // 1. There is schema:LinkRole, 2. Prop can't map to class. 3. Map to schema:relatedLink. Add @type:@id, 4. Specify in ontology range schema:LinkRole
    "AgActivity": { // in ontology, define subclass of schema:Event
      "@id": "https://w3id.org/traceability#AgActivity",
      "@context": {
        "farm": {
          "@id": "https://w3id.org/traceability#dfn-entities" // this is not a prop URL
        "actor": {
          "@id": "https://w3id.org/traceability#Person" // 1. Elsewhere you use schema:Person, 2. Prop can't map to class. 3. Map to schema:actor, 4. Add @type:@id
        "field": {
          "@id": "https://www.gs1.org/voc/Place" // 1. Prop can't map to class. 2. Map to schema:location, which points to schema:Place. 4. Add @type:@id. 3. If needed, define tr:Field as subclass of schema:Place
        "activityDate": {
          "@id": "https://schema.org/DateTime" // 1. Prop can't map to class. 2. Map to schema:startDate. 2. Add @type: xsd:dateTime (schema datatypes should not be used)
        "activityType": {
          "@id": "https://www.schema.org/value" // 1. Use schema:additionalType
        "agProduct": {
          "@id": "https://schema.org/ItemList" // 1. Prop can't map to class. 2. Use `@container: @list` instead of ItemList. 3. But do you really need ordering of this list? If not, the detault `@set` is much better 4. Use prop name `product`, 5. Define range tr:AgProduct
        "observation": {
          "@id": "https://w3id.org/traceability#observation" // Add to ontology: rangeIncludes schema:Observation
    "LinkRole": {
      "@id": "https://schema.org/LinkRole",
      "@context": {
        "target": {
          "@id": "https://schema.org/target" // this is EntryPoint for an Action, not related to LinkRole. Maybe use `uri`?
        "linkRelationship": {
          "@id": "https://schema.org/linkRelationship"
    "Observation": {
      "@id": "https://schema.org/Observation",
      "@context": {
        "property": {
          "@id": "https://schema.org/measuredProperty"
        "measurement": {
          "@id": "https://w3id.org/traceability#MeasuredValue" // No! Use schema:measuredValue but add rangeIncludes QuantitativeValue
        "date": {
          "@id": "https://schema.org/observationDate"
VladimirAlexiev commented 2 years ago

@OR13 @nissimsan I'll try to join your meeting next Tue. I'm a w3c member but never used jitsi.

OR13 commented 2 years ago

@VladimirAlexiev yep, those are all bugs :)

cc @mprorock it was only a matter of time before someone came and shamed us like this ; )

In all seriousness, I think these can be addressed easily by using @type @id correctly and updating the JSON Schemas associated with these terms.

mprorock commented 2 years ago

In all seriousness, I think these can be addressed easily by using @type @id correctly and updating the JSON Schemas associated with these terms.

+1

mprorock commented 2 years ago

@OR13 @nissimsan I'll try to join your meeting next Tue. I'm a w3c member but never used jitsi.

pretty happy to move the trace call off jitsi to google or similar - @OR13 @mkhraisha any objections? let's bring this up on the next call

OR13 commented 2 years ago

@nissimsan @BenjaminMoe I suggest we inventory all credential of interest for us, and start working to eliminate these mistakes.

clehner commented 2 years ago

Some terms are mapping to the same IRI, e.g. the following. I think this falls under this issue (props mapping to classes) - but it's also a malleability issue: one of these properties in a credential payload could be changed to one of the other ones and an LD/RDF-based proof/signature will still be valid. https://github.com/w3c-ccg/traceability-vocab/blob/765fb5124b4fba2f142dcb0397e2e0e5b6f59bb7/contexts/traceability-v1.jsonld#L160-L171 from: https://github.com/w3c-ccg/traceability-vocab/blob/dacc30f930e80e9fc190cb5703d7217cd8eeab05/docs/openapi/components/schemas/common/AgParcelDelivery.yml#L101-L128

nissimsan commented 1 year ago

@clehner, I think this is what you are asking for (last row added)?

 carrier: 
   title: Carrier 
   description: Shipping carrier for product.  
   $ref: ./Organization.yml 
   $linkedData: 
     term: carrier 
     '@id': https://schema.org/Organization 
     '@type': https://service.unece.org/trade/uncefact/vocabulary/uncefact/#carrierParty

We don't have many such types specified. Yet. I consider this "up next" in maturing trace vocab. So pls stay tuned - and don't be shy to help out with PRs! :)

TallTed commented 1 year ago

@nissimsan — I believe that (as discussed in #268), your — https://service.unece.org/trade/uncefact/vocabulary/uncefact/#carrierParty — should be — https://service.unece.org/trade/uncefact/vocabulary/uncefact#carrierParty (the /# becomes just #).

nissimsan commented 1 year ago

I wholeheartedly agree, @TallTed.
The root problem must first be solved here, though. Until then, I'm sticking with whatever (uncool) URLs I get redirected to; otherwise we get into inconsistent definitions territory.

TallTed commented 1 year ago

@nissimsan @VladimirAlexiev — I've made a substantial post on the uncefact issue which I hope helps clarify matters.

Basic point is that the URI of a term does not need to match the URI of the description of that term.

Redirection is entirely legal and nothing at UNECE needs to change for us to use the correct URI of the existing terms (e.g., https://service.unece.org/trade/uncefact/vocabulary/uncefact#carrierParty).

Dereferencing that URI leads to information about that entity (e.g., https://service.unece.org/trade/uncefact/vocabulary/uncefact/#carrierParty).

This is fully in keeping with the Principles of Linked Data.

nissimsan commented 1 year ago

@TallTed, your argument on the cefact issue has changed my mind on this. Thank you.

VladimirAlexiev commented 1 year ago

@nissimsan (cc @clehner )

  1. I think in https://github.com/w3c-ccg/traceability-vocab/issues/270#issuecomment-1223795900 you got @type and @id swapped?
  2. You cannot specify the target class (range) of a prop using @type. In fact you cannot do that in a context at all: you need an ontology for that.
  3. Then there are further considerations: whether the same prop can can target different classes, depending on the source class
nissimsan commented 1 year ago

Hi @VladimirAlexiev,

  1. Right, good point!
  2. Moreover, it would be a double-definition: '@type': https://schema.org/Organization is already defined via $ref: ./Organization.yml.
OR13 commented 1 year ago

We should be careful not to confuse Data Types (in JSON Schema using $ref ) with RDF Classes (in JSON-LD use @type)... they can both be defined with fine granularity, or course granularity... our objective is to find the right level of granularity that supports interoperability and reuse.

TLDR, we want to define both the shape and semantics.

Its also worth acknowledging that we also want to define properties in JSON-LD... not just classes.

TallTed commented 1 year ago

whether the same prop can can target different classes,

target classes may be better understood if called target entity types, above ... a/k/a rdfs:range, identified and described for machines by http://www.w3.org/2000/01/rdf-schema#ch_range, and described for humans by https://www.w3.org/TR/rdf-schema/#ch_range.

depending on the source class

source class may be better understood if called source entity type, above ...a/k/a rdfs:domain, identified and described for machines by http://www.w3.org/2000/01/rdf-schema#ch_domain, and described for humans by https://www.w3.org/TR/rdf-schema/#ch_domain

nissimsan commented 1 year ago

@OR13, right, there was a hidden assumption in my statement. If the JSON Schema and JSON-LD can't be assumed to always be at the same granularity we do need to define both individually. Cheers!

VladimirAlexiev commented 1 year ago

@OR13 I think I need to emphasize what error is pointed by this issue. Let's take the latest example by @nissimsan:

carrier: 
   title: Carrier 
   description: Shipping carrier for product.  
   $ref: ./Organization.yml 
   $linkedData: 
     term: carrier 
     '@id': https://schema.org/Organization 
     '@type': https://service.unece.org/trade/uncefact/vocabulary/uncefact/#carrierParty
  1. Property. https://service.unece.org/trade/uncefact/vocabulary/uncefact/#carrierParty is a property. You need to express the binding like this in the JSONLD context:

    "carrier": {"@type": "@id", "@id": "https://service.unece.org/trade/uncefact/vocabulary/uncefact/#carrierParty"}

    I don't know how that's expressed in $linkedData but I'm pretty sure it's not with @type

  2. Class. There is no way to express the expected target class in a JSONLD context. For that you need an ontology (see #284), and it should say this in turtle:

    uncefact:carrierParty schema:rangeIncludes schema:Organization
    • @TallTed the reason I use schema:rangeIncludes and not rdfs:range is because traceability is such a mish-mash of ontologies that I suspect many props will get multiple ranges. But rdfs:range is monomorphic.
    • @TallTed the rdfs: namespace is different from both ones you pointed to above
    • uncefact: itself says the range is uncefact:Party. It's best to relate this class to schema:Organization (equivalentClass or subClassOf or is subClass), or to use the uncefact: class directly.
OR13 commented 1 year ago

Feels like we can probably do several type: id clean ups at once.

TallTed commented 1 year ago

@VladimirAlexiev

@TallTed the reason I use schema:rangeIncludes and not rdfs:range is because traceability is such a mish-mash of ontologies that I suspect many props will get multiple ranges. But rdfs:range is monomorphic.

Reasonable. Schema.org was (and to a lessening degree, is) a similar mish-mash of a single ontology that had radically conflicting range & domain values, which required schema:rangeIncludes for many

@TallTed the rdfs: namespace is different from both ones you pointed to above

Really? The preset rdfs: namespace at DBpedia matches the rdfs: lookup on prefix.cc, which is what I specified as "identified and defined for machines", i.e., http://www.w3.org/2000/01/rdf-schema#.

  • uncefact: itself says the range is uncefact:Party

And as we should all know by now, an undefined prefix in a prefixed URI is the Devil's playground. Their definition of the uncefact: prefix is https://service.unece.org/trade/uncefact/vocabulary/uncefact# (which you can see plainly in the early lines of https://service.unece.org/trade/uncefact/vocabulary/uncefact-context.jsonld), which makes the range https://service.unece.org/trade/uncefact/vocabulary/uncefact#Party, which you can dereference to find the human-focused description at https://service.unece.org/trade/uncefact/vocabulary/uncefact/#Party.

Gotta love HTTPRange-14!

OR13 commented 1 year ago

I suggest closing this issue, and filing a separate issue for any specific instance....

This issue is basically "there are problems in many files".

brownoxford commented 1 year ago

In order to move forward, we need more discrete, concrete action items. The next step here is to split this larger issue into smaller workloads based on current repository segmentation.

nissimsan commented 1 year ago

@VladimirAlexiev, could you help us provide the entire list, pls? So we know what the damage is, and can plan accordingly.

BenjaminMoe commented 8 months ago

@VladimirAlexiev can you provide any specifics on this?

clehner commented 8 months ago

Here are possible @id improvements for AgricultureParcelDelivery:

diff --git a/docs/openapi/components/schemas/common/AgricultureParcelDelivery.yml b/docs/openapi/components/schemas/common/AgricultureParcelDelivery.yml
index d70e9d8d..98a778bd 100644
--- a/docs/openapi/components/schemas/common/AgricultureParcelDelivery.yml
+++ b/docs/openapi/components/schemas/common/AgricultureParcelDelivery.yml
@@ -38,3 +38,3 @@ properties:
       term: foreignPortExport
-      '@id': https://schema.org/itinerary
+      '@id': https://w3id.org/traceability#foreignPortExport
   portOfEntry:
@@ -45,3 +45,3 @@ properties:
       term: portOfEntry
-      '@id': https://schema.org/itinerary
+      '@id': https://w3id.org/traceability#portOfEntry
   deliveryMethod:
@@ -73,3 +73,3 @@ properties:
       term: specialInstructions
-      '@id': https://schema.org/comment
+      '@id': https://vocabulary.uncefact.org/specialInstructions
   consignee:
@@ -82,3 +82,3 @@ properties:
       term: consignee
-      '@id': https://schema.org/Organization
+      '@id': https://vocabulary.uncefact.org/consigneeParty
   agriculturePackage:
@@ -100,3 +100,3 @@ properties:
       term: movementPoints
-      '@id': https://schema.org/itinerary
+      '@id': https://w3id.org/traceability#movementPoints
   plannedRoute:
@@ -116,3 +116,3 @@ properties:
       term: shipper
-      '@id': https://schema.org/seller
+      '@id': https://schema.org/provider
   purchaser:

that would change the context file as follows:

--- docs/contexts/traceability-v1.jsonld.orig   2023-10-24 19:04:12.879366288 -0400
+++ -   2023-10-24 23:02:45.571477468 -0400
@@ -173,6 +173,6 @@
         "foreignPortExport": {
-          "@id": "https://schema.org/itinerary"
+          "@id": "https://w3id.org/traceability#foreignPortExport"
         },
         "portOfEntry": {
-          "@id": "https://schema.org/itinerary"
+          "@id": "https://w3id.org/traceability#portOfEntry"
         },
@@ -188,6 +188,6 @@
         "specialInstructions": {
-          "@id": "https://schema.org/comment"
+          "@id": "https://vocabulary.uncefact.org/specialInstructions"
         },
         "consignee": {
-          "@id": "https://schema.org/Organization"
+          "@id": "https://vocabulary.uncefact.org/consigneeParty"
         },
@@ -197,3 +197,3 @@
         "movementPoints": {
-          "@id": "https://schema.org/itinerary"
+          "@id": "https://w3id.org/traceability#movementPoints"
         },
@@ -203,3 +203,3 @@
         "shipper": {
-          "@id": "https://schema.org/seller"
+          "@id": "https://schema.org/provider"
         },
@@ -5324,2 +5324,2 @@
   }
-}
\ No newline at end of file
+}

In docs/index.html (Traceability), property definitions could be added for https://w3id.org/traceability#foreignPortExport, etc. - (also #rawMaterial, #workflow, etc. as used elsewhere)

There may be a balance between using an existing property (IRI) from another vocabulary that is close enough to the intended meaning, vs. defining a new one? like "consignee" -> "consigneeParty" (UNECE) and "purchaser" -> "buyer" (Schema.org) seems OK (to me - not a domain expert); but "shipper" -> "seller" seems questionable (and schema.org superseded carrier with provider - not the same either?) "expectedArrival" -> "expectedArrivalFrom" OK, but what about "expectedArrivalUntil"?

https://schema.org/itinerary is a property, but plannedRoute and movementPoints should not both be it, otherwise they collapse into one term which loses the intended different meanings. (general meaning of "itinerary" includes a planned route? but schema.org's definition does not say planned vs. actually happened.)

"agriculturePackage" -> "itemShipped" are different but maybe OK given the surrounding graph data? i.e. with subject ("source entity type" @TallTed suggested) AgricultureParcelDelivery and object (target entity) of type AgriculturePackage, the delivery (event – shipment of goods) is understood to be of the (item – product of agriculture) package.


Here is code to make @type in $linkedData take effect:

diff --git a/packages/traceability-schemas/scripts/openapi-to-context.js b/packages/traceability-schemas/scripts/openapi-to-context.js
index 53b8dbe4..1bdfffc0 100644
--- a/packages/traceability-schemas/scripts/openapi-to-context.js
+++ b/packages/traceability-schemas/scripts/openapi-to-context.js
@@ -24,6 +24,7 @@ const schemasToContext = (srcSchemas, srcContext) => {
     const { term } = curr.$linkedData;
     const rdfClass = {
       '@id': curr.$linkedData['@id'],
+      '@type': curr.$linkedData['@type'],
       '@context': {},
     };
     clone[`${term}`] = rdfClass;
@@ -52,6 +53,7 @@ const schemasToContext = (srcSchemas, srcContext) => {

       rdfClass['@context'][curr.properties[key].$linkedData.term] = {
         '@id': curr.properties[key].$linkedData['@id'],
+        '@type': curr.properties[key].$linkedData['@type'],
       };
     });

Resulting change:

$ jq . docs/contexts/traceability-v1.jsonld | diff -u2 docs/contexts/traceability-v1.jsonld.orig -
--- docs/contexts/traceability-v1.jsonld.orig   2023-10-24 19:04:12.879366288 -0400
+++ -   2023-10-24 19:25:09.066505830 -0400
@@ -2136,5 +2136,6 @@
         },
         "requestedDate": {
-          "@id": "https://w3id.org/traceability#requestDate"
+          "@id": "https://w3id.org/traceability#requestDate",
+          "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
         },
         "accountingInformation": {
@@ -2205,5 +2206,6 @@
         },
         "executedOn": {
-          "@id": "https://w3id.org/traceability#executionTime"
+          "@id": "https://w3id.org/traceability#executionTime",
+          "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
         },
         "executedAt": {
@@ -2301,8 +2303,10 @@
         },
         "inspectors": {
-          "@id": "https://schema.org/Person"
+          "@id": "https://schema.org/Person",
+          "@type": "https://schema.org/Person"
         },
         "place": {
-          "@id": "https://schema.org/Place"
+          "@id": "https://schema.org/Place",
+          "@type": "https://schema.org/Place"
         },
         "chemicalObservation": {
@@ -3638,5 +3642,6 @@
         },
         "portOfEntry": {
-          "@id": "https://w3id.org/traceability#portOfEntry"
+          "@id": "https://w3id.org/traceability#portOfEntry",
+          "@type": "https://schema.org/Place"
         },
         "additionalDeclaration": {
@@ -4006,5 +4011,6 @@
         },
         "licensedCompany": {
-          "@id": "https://vocabulary.uncefact.org/grantedParty"
+          "@id": "https://vocabulary.uncefact.org/grantedParty",
+          "@type": "https://schema.org/Organization"
         },
         "customsEntryNumber": {
@@ -5015,8 +5021,10 @@
         },
         "dateOfEntry": {
-          "@id": "https://w3id.org/traceability#dateOfEntry"
+          "@id": "https://w3id.org/traceability#dateOfEntry",
+          "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
         },
         "signatureDate": {
-          "@id": "https://w3id.org/traceability#signatureDate"
+          "@id": "https://w3id.org/traceability#signatureDate",
+          "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
         },
         "facility": {
@@ -5323,3 +5331,3 @@
     }
   }
-}
\ No newline at end of file
+}

... but this is not for class (ontology) information - which would instead go into a possible new artifact (Turtle - #284) as @VladimirAlexiev mentioned.

@type ("overloaded" keyword) in @context (as in here) is different from in a value object (as commonly aliased "type", like in "type": ["VerifiableCredential", ...]) This is used for dateTime and other xsd/rdf data types like JSON (@json) or HTML; but I'm not sure if it is useful/correct for Organization, etc.

(Referencing: https://www.w3.org/TR/json-ld/#typed-values https://www.w3.org/TR/rdf11-concepts/#datatype-iris)


To develop/reproduce (rebuilding context and comparing changes):

jq . docs/contexts/traceability-v1.jsonld > docs/contexts/traceability-v1.jsonld.orig # save original context file (formatting using `jq`)
vi docs/openapi/components/schemas/common/AgricultureParcelDelivery.yml # make changes to schema file
(cd packages/traceability-schemas; node scripts/openapi-to-context.js) # regenerate context file from schema files
jq . docs/contexts/traceability-v1.jsonld | diff -u1 docs/contexts/traceability-v1.jsonld.orig - # compare context changes