Verifying linked data graphs - Githubissues

uncefact / spec-untp

UN Transparency Protocol

https://uncefact.github.io/spec-untp/

GNU General Public License v3.0

16 stars 17 forks source link

Verifying linked data graphs #10

Open onthebreeze opened 10 months ago

onthebreeze commented 10 months ago

We've an important UNTP section - https://uncefact.github.io/spec-untp/docs/specification/TrustGraphs that needs quite a bit of discussion about how we should even think about doing this.

basically the problem is that links between related things can be technically valid but business invalid. Some examples.

A credential issuer claims to be ABN 3413288567 but the linked government identity credential subject is ABN 3455667788. So the issuer is lying about their business identity.
A conformity credential is issued by a certifier and the scope is about motorcycle helmet safety. The conformity credential has a linked accreditation credential from NATA which is technically valid but days the scope of accreditation is animal health. So the certificate issuer is certifying something they are not authorised to do.
You scan a barcode with GTIN : 12345678910 and it takes you to a DPP credential about GTIN 10987654321 - so the passport is about the wrong product.
and may more like this

all are about verifying a collection of 2 or more credentials and whether the links between then are valid in a business sense.

Is it possible to define the validation rules using something like shacl? Or some other way of specifying rules? And if so then who defines the rules? maybe the creator of a UNTP extension for a specific industry / geography like Australian Agriculture?

Fak3 commented 10 months ago

A credential issuer claims to be ABN 3413288567 but the linked government identity credential subject is ABN 3455667788. So the issuer is lying about their business identity.

If understand correctly, the validation here must include retrieval of that linked government identity credential. That is already a step that cannot be formalized with just shacl or json-schema rules.

Some validation steps can be formalized in shacl or json-schema, but full validation process must include other steps for implementers to follow.

onthebreeze commented 10 months ago

@Fak3 : yes the mechanism to retrieve a bundle of credentials is separate to the analysis of the graph of linked data that is created from the credentials. Credentials will be discoverable from product or entity identifiers via a link resolution protocol. And credentials may contain links to other credentials. But this issue can assume that a verifier has followed links, discovered a number of related credentials, and is holding the data in some kind of graph store - and now wants to do some verification of the graph.

Fak3 commented 10 months ago

SHACL playground example validates that issuer of MotoGearSafetyCredential has capability "caps/MotoGearSafety": https://s.zazuko.com/3AJQNCR

Fak3 commented 10 months ago

One potential issue that comes to mind is that individual subgraphs may contradict each other, and naively validating the result after merging them together conceals where the problem came from.

Fak3 commented 10 months ago

For example the fraudulent VC can say that its issuer has the needed capability. And this VC is signed by the issuer himself.

If we blindly merge everything that forged VC says with the data (another VC) from the national authority, and then validate, verifier won't recognize the fraud.

Fak3 commented 10 months ago

What I'm trying to say is that due to information loss, the majority of checks we have to perform on separate VC subgraphs, and only a few can be done on the merged graph afterwards.

onthebreeze commented 10 months ago

The fake accreditation Vc is why we have the trust anchors section of UNTP. Any VC of type accreditation must be issued by a very short list of known and trusted authorities - eg did:web:NATA.com.Au

ashleythedeveloper commented 10 months ago

Is it possible to define the validation rules using something like shacl? Or some other way of specifying rules?

Based on @Fak3 's example and my research, Yes, it would seem that it is possible to use SHACL to perform the type of validation in the examples you provided with the addition of SPARQL in some cases. But by all means, it's not the only way.

For example, another option could be using the query language of a graph database like Cypher Query for neo4j.

MATCH (issuer:Issuer)-[:CLAIMS_IDENTITY]->(identity:GovernmentIdentity)
WHERE issuer.abn <> identity.abn
RETURN issuer.abn AS issuerABN, identity.abn AS identityABN`

nissimsan commented 10 months ago

Related: https://medium.com/transmute-techtalk/the-united-nations-trust-graph-d65af7b0b678

JohnOnGH commented 9 months ago

I've just re-read Nis' paper, and agree with the concluding comments. Basically I think we're going to run into a problem that we all know exists: not only should we expect and guard against bad actors (to the extent possible), but also we cannot expect all items and all nodes in any supply chain to adopt the same standard at the same time (and perhaps, ever).

This rather clumsy phrase is my way of saying that the existing systems are complex, adaptative, and real-world messy. We cannot expect a single approach to be adopted by all participants, no matter how attractive. Even if we were remarkably optimistic, such adoption would be a gradual rollout, at the beginning no participants use the approach, then some (scattered and in pockets), and then more do until (wildly optimistic), they all do.

That is not to say that I am against this idea or approach. I am a (deep) fan of the graph concept, last year I proposed an approach to consider "Governance" as a "governance graph" concept within Trust over IP's "Governance Architecture Task Force" (https://docs.google.com/presentation/d/1vYUJW76BEK_CQotAZ5maXYwe3H3K3dKCPAcTqEgfGJQ/edit).

My observation is that we have to accept/expect that we will have imperfect information, and we have to decide what to do about that. My expectation is that the relying party / verifier will explore the graph until they have satisfied their need for proof, or until they have exhausted the ability to explore the graph. If exhaustion occurs before acceptable proof, then they need to seek alternative/additional proof and/or accept that the claims are not fully verified and make a decision based on that.

We can consider this in terms of hard rules and soft rules. The hard rules (regulations, law etc.) will demand that we must have proof of claims that are acceptable within the jurisdiction in which they (and we) are being tested. The soft rules might be best efforts, nice to haves, preferences etc. and may allow some "wiggle room".

We need wiggle room.

Basically my heuristic is to explore the graph until your needs are satisfied, or the graph is exhausted, then make a decision and/or ask for more information.

onthebreeze commented 9 months ago

For sure we cannot assume that an entire t-shirt to cotton farm credentials graph exists on day 1, if ever. Our architecture must assume that there are only little snippets of graphs - and that quite often a link to a conformity credential (for example) will take you to a pdf not a vc

For me I think it's enough to start with just a few minimal use cases where there are just 2 or 3 nodes in a graph. For example :

a product passport links to a conformity credential which links to as accreditation credential. The graph verification should confirm that

the product SKU or other identifier in the passport is the same as the one in the conformity credential (ie the certificate is about the right product)
the attestation scope of the conformity credential is the same as the scope of the linked accreditation (ie the certifier is authorised to issue the atteststion)

I'd suggest that the way forward is to identify several more use cases, create several realistic sample graphs, write a validator (SHACL?) for each - and see if any useful patterns emerge that we can document as best practice / protocols in UNTP

JohnOnGH commented 9 months ago

Agreed, I was hoping/expecting that we were being pragmatic! The aim is that the linked graph will provide benefit, even if the graph is incomplete.

nissimsan commented 9 months ago

Related, here's a link to the demo I presented a couple of meetings back: https://trace.dpp.ni5.io/

nissimsan commented 7 months ago

To close this, we need a PR to https://uncefact.github.io/spec-untp/docs/specification/TrustGraphs with a trust graph. Must include:

Identity linked to a claim
Conformity linked to a claim
Accreditation Authority

nissimsan commented 7 months ago

https://medium.com/transmute-techtalk/the-united-nations-trust-graph-d65af7b0b678

nissimsan commented 7 months ago

@zachzeus :

Same identity across credentials
Conformity credential linked to claim. Conformity credential is talking about the same thing you are linking
Certifier is accredited by a trusted 3rd party

onthebreeze commented 7 months ago

We can probably use TSM (Towards Sustainable Mining) for this - because Nancy in BC is already working with MAC (Mining Association of Canada) to do exactly that. So we could make our example 3 patterns real using TSM as a realistic example.

zachzeus commented 7 months ago

This really will be explored in our test architecture. This will be a pull request added once we add the refactored UNTP site. We are also working on this kind of testing in a reference implementation. The key outcome for UNTP is that there are some simple well described test for UNTP and implementers will do a lot more.

My next steps are to describe the tests: Done looks like:

List of test cases that define actor(s), inputs and outputs and test data (positive and negative).
Using UNCRM scenario as an example
Scenario map including prerequisites.

philarcher commented 6 months ago

I think SHACL might be a useful component but, for the reasons others have said, it's not a full validation tool. What we can say is that if the graph matches a SHACL pattern then it might be OK. It's an early-stage test before you get into what might be computationally-heavy inspecting of individual claims and the human assessment thereof.

Should it be helpful in this regard, the RDF Canonicalization spec is about to become a W3C Rec (WG co-chair's insider knowledge ;-) )

zachzeus commented 3 months ago

We've been working on what the trust graph testing for UNTP looks like and this becomes a pattern that implementers can extend. The UNTP validation will be based on what how the links between the core UNTP schema are validated. The components that we will demonstrate links for are:

DPP
DCC
DTE
DIA