Closed Fak3 closed 1 month ago
@VladimirAlexiev, as a coauthor of BuisnessGaph ontology, can you please comment if the proposed reuse of it is adequate?
Hi @Fak3 , thanks for referencing EuBusinessGraph!
IdentifierSystem
, thus no isPartOf
, and it surely doesn't reference our projectIdentifierSystem
because sometimes an agency issues multiple kinds of identifiers, and because we wanted to add more details about IdentifierSystemscustoms
) or specify the country/jurisdiction@VladimirAlexiev at https://github.com/w3c-ccg/traceability-vocab/issues/944 you proposed to use schema:PropertyValue
instead of more specific adms:Identifier
. Is there any reason to prefer the former?
Not sure how to resolve this one. @Fak3 says
As currently suggested in DigitalproductPassport.md, identifier is described with idScheme, idValue, idSchemeName:
{
"id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
"name": "ACME Pty Ltd",
"idValue": "90664869327",
"idScheme": "abr.business.gov.au",
"idSchemeName": "Australian Business Number"
}
The problem is that those properties are assigned not to the specific identifier, but to the entity, which can have multiple identifiers with different identification schemes.
I don't think I agree with the assertion that these properties are not specific to the identifier.
the id is the full URI of the identifier - globally unique the name is human readable name as registered with that speecific registry the idValue is the identifier as it is know within the registry - unique only within the registry. the idScheme is the URI of the registry itself. the idSchemeName is the human readable name of the id scheme.
Lets imagine the same business entity has another identifier in another business register
{
"id": "https://gln.gs1.org/1234567",
"name": "ACME Industries",
"idValue": "1234567",
"idScheme": "gln.gs1.org",
"idSchemeName": "Global Location Number"
}
Totally different values for every property. There in nothing that demands an entity to use their national registered legal entity name when creating a GLN with GS1 - they might choose something more like their trading name.
Could you clarify the problem here?
Lets imagine the same business entity has another identifier in another business register
{
"id": "https://gln.gs1.org/1234567",
"name": "ACME Industries",
"idValue": "1234567",
"idScheme": "gln.gs1.org",
"idSchemeName": "Global Location Number"
}
and
{
"id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
"name": "ACME Pty Ltd",
"idValue": "90664869327",
"idScheme": "abr.business.gov.au",
"idSchemeName": "Australian Business Number"
}
{
"id": "https://abr.business.gov.au/ABN/View?abn=90664869327"
"owl:sameAs": "https://gln.gs1.org/1234567"
}
{
"id": "https://gln.gs1.org/1234567",
"owl:sameAs": "https://abr.business.gov.au/ABN/View?abn=90664869327",
"name": ["ACME Industries", "ACME Pty Ltd"],
"idValue": ["1234567", "90664869327"],
"idScheme": ["gln.gs1.org", "abr.business.gov.au"],
"idSchemeName": ["Global Location Number", "Australian Business Number"]
}
and
{
"id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
"owl:sameAs": "https://gln.gs1.org/1234567",
"name": ["ACME Industries", "ACME Pty Ltd"],
"idValue": ["1234567", "90664869327"],
"idScheme": ["gln.gs1.org", "abr.business.gov.au"],
"idSchemeName": ["Global Location Number", "Australian Business Number"]
}
Thus, the original intent of assigning idScheme
to single node with particular identifier is violated by applying owl:sameAs
according to its semantics. I.e the resulting graph will contain the nonsense triple:
<https://gln.gs1.org/1234567> <idScheme> "abr.business.gov.au" .
The issue here is that with the current data model intended separation between the distinctly identified nodes can't be ensured. The RDF states that properties describe entity itself, while we are currently assuming that properties describe entity's identifier, violating that rule.
So my proposal is to separate identifier metadata into its own node, which explicitly describes identifier of that original entity. In that proposal identifier becomes a distinct entity (graph node with type adms:Identifer
), linked to the original entity (graph node with type Party
) via adms:identifier
property.
In addition, it makes sense to split out a separate IdentifierSystem
node.
Otherwise different Identifier
nodes can have inconsitent info about the scheme that they use.
Currently you have only 2 props:
"idScheme": "abr.business.gov.au",
"idSchemeName": "Australian Business Number"
but in the future you may have more, eg:
You can read about it in the euBusinessGraph Semantic Data Model https://docs.google.com/document/d/1dhMOTlIOC6dOK_jksJRX0CB-GIRoiYY6fWtCnZArUhU/edit#heading=h.hofh07qhoz6m
OK I'll separate out the id scheme into it's own class with it's own "id" so that the graph will have a seaprate node for identity schemes.
This issue still exists on current published spec: Example DPP from https://uncefact.github.io/spec-untp/docs/specification/DigitalProductPassport:
"issuer": {
"type": "CredentialIssuer",
"id": "did:web:identifiers.acme.com:12345",
"name": "ACME industries",
"otherIdentifiers": [{
"type": "Entity",
"id": "https://abr.business.gov.au/ABN/View?abn=90664869327",
"name": "ACME Pty Ltd",
"idValue": "90664869327",
"idScheme": "abr.business.gov.au",
"idSchemeName": "Australian Business Number"
}]
},
Should be:
"issuer": {
"type": "CredentialIssuer",
"id": "did:web:identifiers.acme.com:12345",
"name": "ACME industries",
"identifier": [{
"type": "Identifier",
"notation": "https://abr.business.gov.au/ABN/View?abn=90664869327",
"name": "ACME Pty Ltd",
"isPartOf": {
"id": "abr.business.gov.au/ABN",
"type": "IdentifierSystem",
"name": "Australian Business Number"
}
}]
},
I missed to update the sample snippet on the the page. But the model and schema and sample at the top of the page is different -please check https://uncefact.github.io/spec-untp/assets/files/untp-digital-product-passport-v0.3.6-86c6ee585e0905f8871b40838616f9ff.json
I missed to update the sample snippet on the the page. But the model and schema and sample at the top of the page is different -please check https://uncefact.github.io/spec-untp/assets/files/untp-digital-product-passport-v0.3.6-86c6ee585e0905f8871b40838616f9ff.json
This sample has the same issue:
"issuer": {
"type": [
"CredentialIssuer"
],
"id": "did:web:identifiers.example-company.com:12345",
"name": "Example Company Pty Ltd",
"otherIdentifiers": [
{
"type": [
"Entity"
],
"id": "https://business.gov.au/ABN/View?abn=1234567890",
"name": "Sample Company Pty Ltd",
"registeredId": "1234567890",
"idScheme": {
"type": [
"IdentifierScheme"
],
"id": "https://business.gov.au/ABN/",
"name": "Australian Business Number"
}
}
]
},
Should be:
"issuer": {
"type": "CredentialIssuer",
"id": "did:web:identifiers.example-company.com:12345",
"name": "Example Company Pty Ltd",
"identifier": [
{
"type": "Identifier",
"notation": "https://business.gov.au/ABN/View?abn=1234567890",
"name": "Sample Company Pty Ltd",
"isPartOf": {
"type": "IdentifierSystem",
"id": "https://business.gov.au/ABN-HTTP",
"name": "Australian Business Number URL"
}
},
{
"type": "Identifier",
"notation": "1234567890",
"name": "Sample Company Pty Ltd",
"isPartOf": {
"type": "IdentifierSystem",
"id": "https://business.gov.au/ABN",
"name": "Australian Business Number"
}
}
]
},
Also note that "https://business.gov.au/ABN-HTTP" and plain "https://business.gov.au/ABN" are different IdentifierSystems
I think there might be a linked data architecture or strategy question behind this issue. I think it boils down to a question of whether entities should be merged with there is some kind of equivalence declared or only when identifiers are exactly identical. If an entity with ID = abn.gov.au/123454567 declares "otherIidentifiers" like gs1.org/gln/9876543 then does this mean they are the same and all data about abn.gov.au/123454567 and gs1.org/gln/9876543 should be merged?
In the example given, an ABN is an Australian national business tax registration number. A GLN is a GS1 identifier for a logistics location. In some cases where a business has only one operating location these two IDs could resolve to very similar things. But even so, a legal tax registration is not the same as a logistics location. Also, as soon as the business opens a second location and creates another GLN there will be far worse inconsistencies associated with any merge.
I suggest add some words in the graphs section of UNTP to emphasise that meta data of two entities should only be merged when the declared identifiers are identical (eg two instances of gs1.org/gln/9876543) but never when two different identifiers are declared to be related and possibly equivalent.
Section "9.2 Equivalence" of gs1 digital link spec https://www.gs1.org/docs/Digital-Link/GS1_Digital_link_Standard_i1.1.pdf mentions use of owl:sameAs
As well as the section "7.2 Decompression":
@philarcher @mgh128 Can you please tell if you see a realistic scenario for a single product referenced by several different gs1 links? For ex. issuer1 uses compressed link, issuer2 uses uncompressed GTIN+batch. Could both links reference the same product in two different Product Passports or Conformity Credentials?
If we forbid to process owl:sameAs
on the verifier side, we must document that clearly in the spec. This restriction will force verifiers to to do one or a combination of the following:
owl:sameAs
statements idScheme
property)Hi, in the EU context the Nordic business register authority cooperation advocates the use of adms:Identifier but in the way it's been modeled in the EU Core Vocabularies: https://semiceu.github.io/Core-Business-Vocabulary/releases/2.2.0/#Identifier
Here all the attributes are directly properties of the Identifier class... "Properties > For this entity the following properties are defined: date of issue , identifies , notation , schema agency , scheme name , scheme URI ."
Then again there's a reference to "the UN/CEFACT class with the same name" - but in the UN/CEFACT CCL these attributes are actually includet in the uDT (IdentifierType) - and there they are indeed mostly properties of the "Identification Scheme".
An example of a Finnish implementation of EU Core Business Vocabularies > "Identifier" https://tietomallit.suomi.fi/en/model/isa2core/class/Identifier?ver=0.1.0 - this is the OWL Vocabulary, from which SHACL Shapes are derived by reuse.... RDF version of the whole vocabulary: https://tietomallit.suomi.fi/api/getModelAsFile?modelId=isa2core&fileType=RDF&raw=true&version=0.1.0&language=en
The gs1 doc basically says "use owl:sameAs" with care - only when you are really sure the two different t identifiers refer to the same thing.
UNTP is not going to specify owl:sameAs in any scenario - that's a choice of the processor using vocabularies we don't control
UNTP can only recommend that "If two things have different identifiers then dint legę them". I really don't understand why this is even a contentious issue?
The gs1 doc basically says "use owl:sameAs" with care - only when you are really sure the two different t identifiers refer to the same thing.
UNTP is not going to specify owl:sameAs in any scenario - that's a choice of the processor using vocabularies we don't control
UNTP can only recommend that "If two things have different identifiers then dint legę them". I really don't understand why this is even a contentious issue?
We don't really leave a choice to the processors. If they receive and accept equivalence statements, they face the inconsistent graph, as described in the comments above. So to prevent inconsistent processing we must either fix data model as proposed here, or document specific guidance how to deal with inconsistency.
Ok fair enough. But where are we (ie UNTP) making any equivalence statement using owl:sameAs ? Is there an assumption that "otherIdentifiers" or "alsoKnownAs" should be interpreted as "owl:sameAs"?
Ok fair enough. But where are we (ie UNTP) making any equivalence statement using owl:sameAs ? Is there an assumption that "otherIdentifiers" or "alsoKnownAs" should be interpreted as "owl:sameAs"?
We do reference using GS1 digital link in our IdentityResolver.md
GS1 digital link spec mandates the owl:sameAs relationship between short (compressed) and full links
Hi, in the EU context the Nordic business register authority cooperation advocates the use of adms:Identifier but in the way it's been modeled in the EU Core Vocabularies: https://semiceu.github.io/Core-Business-Vocabulary/releases/2.2.0/#Identifier
Then again there's a reference to "the UN/CEFACT class with the same name" - but in the UN/CEFACT CCL these attributes are actually includet in the uDT (IdentifierType) - and there they are indeed mostly properties of the "Identification Scheme".
Thank you! I believe it is important to align with EU Core business vocabulary and UN/CEFACT CCL. We should reuse same data model and have dedicated Identifier
class for identifier metadata.
The uncefact ccl defines an identifier data type that mixes both the ID of the entity and the ID of the identifier scheme in one class. Which is what I thought was exactly what you are objecting to?
The uncefact ccl defines an identifier data type that mixes both the ID of the entity and the ID of the identifier scheme in one class. Which is what I thought was exactly what you are objecting to?
No, as we discussed on slack, separating identifierSystem metadata is not that important, as it does not lead to inconsistent graph.
But surely we are talking about two different questions here. Short and full links are just different technical representations of exactly the same registry entry. There's not even a merge question here because they both point to the exact same entry.
But whether or not to merge two entries across two different registers is not the same question.
But surely we are talking about two different questions here. Short and full links are just different technical representations of exactly the same registry entry. There's not even a merge question here because they both point to the exact same entry.
But whether or not to merge two entries across two different registers is not the same question.
"http://example.org/gtin/054123450013/lot/ABC%26%2B123?3103=000189&3923=2172" And "(3103)000189(01)05412345000013(3923)2172(10)ABC&+123" reference the same product but follow different identifier schemes with different parsing rules, so it is incorrect to say that the first has the compressed identifier schema and the second has a full one
Those are not different schemes. They are both the same GTIN. Just different technical representations of the same thing. "Scheme" does not refer to a technical syntax. A different scheme means a different register (like ABN vs GLN).
Those are not different schemes. They are both the same GTIN. Just different technical representations of the same thing. "Scheme" does not refer to a technical syntax. A different scheme means a different register (like ABN vs GLN).
For the example of same schema and different registries, there is an european union open data endpoint: https://data.europa.eu/data/sparql?locale=en
queriying it with SELECT distinct ?d ?x WHERE { ?d owl:sameAs ?x . ?d a vcard:Organization . } LIMIT 1000
Reveals that
<http://data.brreg.no/enhetsregisteret/enhet/950037687> <owl:sameAs> <https://register.geonorge.no/organisasjoner/norsk-institutt-for-naturforskning>
i.e there are two registries: 1. data.brreg.no 2. register.geonorge.no And they state that those two organization identifiers are strictly equivalent.
Do we ever encounter such cases in untp?
But surely we are talking about two different questions here. Short and full links are just different technical representations of exactly the same registry entry. There's not even a merge question here because they both point to the exact same entry.
But whether or not to merge two entries across two different registers is not the same question.
"http://example.org/gtin/054123450013/lot/ABC%26%2B123?3103=000189&3923=2172" And "(3103)000189(01)05412345000013(3923)2172(10)ABC&+123" reference the same product but follow different identifier schemes with different parsing rules, so it is incorrect to say that the first has the compressed identifier schema and the second has a full one
http://example.org/01/054123450013/10/ABC%26%2B123?3103=000189&3923=2172
is an example of a fully uncompressed GS1 Digital Link URI.
(3103)000189(01)05412345000013(3923)2172(10)ABC&+123
is an example of a corresponding GS1 element string using parentheses around the GS1 Application Identifiers. It is not a GS1 Digital Link URI nor any kind of compressed format.
It would be OK to express an owl:sameAs relationship between an uncompressed GS1 Digital Link URI and the exactly equivalent compressed or partially compressed GS1 Digital Link URIs but only if they identify the same thing, i.e. if the fully/partially compressed GS1 Digital Link URI encodes the same combination of GS1 Application Identifiers and their values.
In practice, GS1 does not currently recommend the use of fully compressed or partially compressed GS1 Digital Link URIs within 2D barcodes for products. In most situations, a cautious use of upper-case alphanumeric characters and very few symbol characters enables efficient QR encoders to use the "alphanumeric" mode rather than "binary/byte" mode and this typically achieves an equivalent reduction in size of QR Code without the complexity of handling compression or decompression.
I would expect that when GS1 Digital Link URIs are used within Linked Data or within Verifiable Credentials, it would be the fully uncompressed format, without any compression.
I hope this helps.
I think the discussion went astray. The gist is that we need different classes for entity and Identifier . As for IdentifierSystem, that is optional, depending on whether we need to:
I still do not understand. Every node in a graph needs an identifier. So if there is an Entity class it must have an id. Separately that id may be issued under a governed scheme which itself has an id.
So I understand if the requirement is "entities (with id) should be a separate class from identifierScheme (with it's own id)". But I don't understand what it means to say "we need sepaate classes for entity and identifier". If the id of an entity is in a separate class then what is the id of the entity class??
@onthebreeze graph node must have exactly one primary id. Also it can have additional identifiers associated with it via properties.
Instance of a Product class, as an abstract concept can be independently described by multiple parties (issuers), while each party independently can choose a different primary id for the graph node which represents that same instance of a Product class in their own separate graph.
Verifier receives those separate graphs and some additional data which suggests that those parties indeed chose different primary id for the same Product instance. Now knowing that those ids refer to the same entity, he can correctly apply business rules, treating properties of those separate graph nodes as if they belong to the same entity.
So one issuer chose one identifier of an entity and promoted it to be graph node's primary id. Then, as currently suggested, he associates idScheme and all other product properties with this primary id. Note here is that all of these properties describe a particular physical product instance, except for idScheme which describes the abstract arbitrarily chosen primary id of a node in the graph.
Now the verifier attempting to apply business logic must be careful, because it faces data with physical properties of a product, which does not depend on the issuers choice, mixed with multiple idScheme properties, each one of idScheme is only valid for a particular choice of node's primary id of the issuer.
I have just stumbled upon a recommendation to use adms:identifier
property to link DCAT datasets in section 5.1.2 of the report on Open government data ecosystem in Europe
Currently (and probably for the foreseeable future), GS1 only recommends the use of uncompressed GS1 Digital Link URIs - neither fully compressed or partially compressed GS1 Digital Link URIs are supported by GS1 application standards.
However, it would still be acceptable to use an owl:sameAs relationship between an uncompressed GS1 Digital Link URI formed from a registered domain name and the corresponding canonical GS1 Digital Link URI using id.gs1.org as the hostname.
For Digital Product Passport data, I fully expect that factual claims will be expressed at various granularities of identification - some facts may apply to every instance of a product having that GTIN, while others may be specific to a specific GTIN+Batch and others may be specific to one individual product instance identified by GTIN+SerialNumber.
An AIDC data carrier such as a 2D barcode might encode GTIN, CPV, Batch/Lot, SerialNumber but some data (such as EPCIS event data for supply chain traceability / visibility) might only use GTIN+SerialNumber when reporting that an individual object was observed at a location or participated in a particular business process step. That's because GTIN+SerialNumber provides the finest granularity of identification and from those details, CPV or Batch/Lot should be accessible via lookup of the master data for that GTIN+SerialNumber.
On Mon, Sep 2, 2024 at 11:21 AM Evstifeev Roman @.***> wrote:
Section "9.2 Equivalence" of gs1 digital link spec https://www.gs1.org/docs/Digital-Link/GS1_Digital_link_Standard_i1.1.pdf mentions use of owl:sameAs Screenshot_20240902_151359.jpg (view on web) https://github.com/user-attachments/assets/81904942-54e0-4b46-b845-9245172edcd0
As well as the section "2.3 Decompression": Screenshot_20240902_151201.jpg (view on web) https://github.com/user-attachments/assets/8230402b-2ff3-4f28-887a-021a090d4a23
@philarcher https://github.com/philarcher @mgh128 https://github.com/mgh128 Can you please tell if you see a realistic scenario for a single product referenced by several different gs1 links? For ex. issuer1 uses compressed link, issuer2 uses uncompressed GTIN+batch. Could both links reference the same product in two different Product Passports or Conformity Credentials?
If we forbid to process owl:sameAs on the verifier side, we must document that clearly in the spec. This restriction will force verifiers to to do one or a combination of the following:
- construct more complex SPARQL queries
- strip off incoming owl:sameAs statements
- apply custom graph merging rules (ignore or replace idScheme property)
— Reply to this email directly, view it on GitHub https://github.com/uncefact/spec-untp/issues/137#issuecomment-2324368962, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSXRL64ADFK7R4TYGXF7H3ZUQ3ZDAVCNFSM6AAAAABLU2NBZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRUGM3DQOJWGI . You are receiving this because you were mentioned.Message ID: @.***>
I think the outcome of all this is
Identifier
and IdentifierScheme
should be different node objects - this is now the case.Closing this unless there are objections.
Products and Organizations can have multiple identifiers expressed using various identifier schemes.
Current model issues
As currently suggested in DigitalproductPassport.md, identifier is described with idScheme, idValue, idSchemeName:
The problem is that those properties are assigned not to the specific identifier, but to the entity, which can have multiple identifiers with different identification schemes.
Let's imagine this json-ld data is stored and processed by owl inferencer. And in their graph database they already reflect that this organization have two identifiers:
Processing this new data according to owl:sameAs semantics, the owl inferencer will add new triples to the graph:
This does not make sense, as it means Decentralized Identifier conforms to ABN identifier scheme.
Proposed model
To resolve the issue, identifiers should be described separately from the entity itself:
In the example above I omitted
"type": "Identifier"
but included explicit"type": "IdentifierSystem"
. Whether we should require those types explicitly declared in documents is debatable.I suggest reusing existing vocabularies:
adms:Identifier
based on the UN/CEFACT Identifier class. Properties:ebg:IdentifierSystem
Class
ebg:IdentifierSystem
from BusinessGraph vocabulary Definition from ontology.ttl: "A system managed by a publisher (e.g., a register or agency) that is used to issue identifiers to entities (companies, persons, etc)." Properties:Property
adms:identifier
that links a resource to theadms:Identifier
.There are other properties in the BusinessGraph ontology wich can be reused, for ex jurisdiction, issuance and expiration date, etc... We should probably add them to our json-ld context file as well.
Related to #135
Similar issue was discussed on traceability vocab: https://github.com/w3c-ccg/traceability-vocab/issues/944