w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
https://w3c.github.io/vc-data-model/
Other
287 stars 105 forks source link

`proof` in `@context` and the use of `@container` #881

Closed OR13 closed 1 year ago

OR13 commented 2 years ago

I've been using Neo4j a lot lately.

One of my favorite features is the ability to preview (framed) JSON-LD.

For example:

CALL n10s.rdf.preview.inline(
'
    {
        "@type": "https://schema.org/Organization",
        "https://schema.org/description": "Realigned maximized alliance",
        "https://schema.org/name": "Bartell Inc "
    }
', 'JSON-LD')

For simple cases this works fine... but when I attempt to apply this to spec compliant verifiable credentials, I get a weird blank node issue with the proof block.

Here is a picture of what I mean:

Screen Shot 2022-06-15 at 1 40 12 PM

Notice the 2 blank nodes that separate these disjoint subgraphs.

I believe this is caused by the way the proof block is defined in the v1 context:

https://github.com/w3c/vc-data-model/blob/v1.1/contexts/credentials/v1#L45

"proof": {"@id": "sec:proof", "@type": "@id", "@container": "@graph"},

This is a lot of complexity... for one of the most important term definitions the standard provides.

I believe this is also the cause of the "double blank node" issue, I observed above.

I think what happens is that a first blank node is created for the proof, and since that node has @container @graph, instead of being able to trace the relationships directly from credential to proof to verification method...

Each proof is being treated as a disjoint subgraph, and the relationship is not being preserved during preview / import...

This is really not ideal, since I am interested in querying changes in these proofs over time for credentials, and that relationship is not being imported.

I suspect this is solvable with a more complicated graph config: https://neo4j.com/labs/neosemantics/4.0/config/

But I wonder if we might correct this behavior in VC Data Model 2.0, such that RDF representations don't have this odd behavior when imported as labeled property graphs.

Anyone know how to solve this?

OR13 commented 2 years ago

Relevant sections of the JSON-LD TR:

implicitly named graph A named graph created from the value of a map entry having an expanded term definition where @container is set to @graph.

https://www.w3.org/TR/json-ld11/#graph-containers

When expanded, these become simple graph objects.

^ pretty sure this is the culprit... it means that if you expand a credential, you loose the relationship between the credential and its proof.

msporny commented 2 years ago

"proof": {"@id": "sec:proof", "@type": "@id", "@container": "@graph"},

This states that proof will be contained in a separate graph than the default graph. RDF Dataset Canonicalization does this to separate the data you're signing (which is in the default graph) from the proof data (which is in a different graph). Both graphs together constitute an RDF Dataset and both items are signed over when generating a Data Integrity signature . We did this to ensure that the signature graph didn't pollute the "data being signed" graph.

URGNA2012 (Universal RDF Graph Normalization Algorithm 2012) didn't do this as it only dealt with RDF Graphs, not RDF Datasets, and so we just shoved all the RDF signature data into the default graph (and some people were rightfully upset by that).

When the RDF 1.1 work expanded to include RDF Datasets (part of the driver there was to support concepts that JSON-LD supported but the core RDF data model at the time didn't support), we separated the "data to be signed" from the "signature information" to ensure a cleaner separation between the two types of data. That became the URDNA2015 (Universal RDF Dataset Canonicalization Algorithm 2015).

Hopefully the benefits of this architectural separation between original data and signature data are clear... if they're not, I'm happy to try and elaborate on how jumbling "data to be signed" with "the signature" leads to dirty data over time, especially when you shove it into / take it out of graph databases.

As for what neo4j is doing there... you might ask them how they link statements between RDF Graphs in an RDF Dataset... might just be a limitation on their tooling. The JSON-LD Playground doesn't seem to suffer from the same limitation.

VladimirAlexiev commented 2 years ago

@OR13 can you give the JSONLD you use to make that neo4j graph? I'll try it in GraphDB.

OR13 commented 2 years ago

Here is the example: https://v.jsld.org/2MSdzhMeTHEeoaL1RUjCULnetM3fLnCvckiBMek9DfNYKke6yYGn3ZFL51tmaXqH4VtVkQCqKUy8GCtorQzxd1Y1Na6oLdgR9pKW4oRoMAEj3dXSRn2c8rCkyfPXJxwNfUMfezVvcCAUVB2BGr6GQYouQZH7tYzb8cq9qk2DGQYtDtbyMFBTSHVxCFEzHRhAmtrNzoi9g8mo4vrooak8uCCWkBENYiHzFpLjNdP3rv3m34nd5CGvaMWxKSahsFV4tauACbDEnLXqpAuVJf2ti6U7pxkeYXEQXAGAuhZYBrCoS81FizGFYkYN3sGEfTrQCZFhw1qycxzRDhVdot8L1A1EXA8xiLsaq7CgiWfNrSJbbCjHqA83wGoxi4wFRABsDCxpDTcfaKQHJcedH9vg99VE9V4Jn5v6598U3Lkp9SCuiXwo2sCNqqyuuAcPPDk3sVZD7C62ZgiAHvMHikZDs9EuuxLNzSzTZQV8JD3pkN8Vtz1MEnDK1iQvbT1PoLTRdnrKCUHkMct97gWkGEyiNL5vCZqUTzwkofiiSJkQrMUPoYuotwk6nMK6T67RGAF8qpqFLFoAij8orduXm71xPr9JR92bD1v5YSPjPEW1FwFxpvtfRQ8nxACKKMRc8NANHCwX2ZPULHR4pFN7q8M3ngCw2fpxLJpsk2ZpDJzFAFBpy8PckSG6wZ87QWdW6R7qGn2DuC1o3VmdAKWTG2ERRUo7aYcAvYND2m4qExsCExDbQZACyvqQH9izNeyKkD29srshnacVSPjXd9DQnTCucw2vNZvzTYtoPLEoBa1SzqMbrnQ57XtnzGMMYafY9HyRG98YG4bDSVXDBVckWDcu1gS8gmjsdyTPwWUHGnqSG3yeHwNuu5vJT2MUkyEA1mV8YzpCdD85Fe2PdhqBA

OR13 commented 2 years ago

@msporny thanks, I figured that is what was happening.

^ these are the URIs that neo4j assigns to the blank nodes (based on a default graph config):

CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE
...
CALL n10s.graphconfig.init({
handleVocabUris: 'MAP'
})

... so it is possible to query over the gap, between the graphs, you just have to do some string magic.

msporny commented 2 years ago

these are the URIs that neo4j assigns to the blank nodes

Hrm, that feels a bit weird. It looks like they're sharing some part of the bnode ID space, but then tacking something on at the end (-b1 and -b6) to give them different IDs. We'd have to talk with their core engineering team to understand why they decided to do it that way vs. just use a universal bnode space for graph names in a dataset.

re: https://v.jsld.org/ -- that's a neat visualization tool :)

Note that the text/x-nquads output shares the same namespace for blank nodes (_:c14n1) and graph names (_:c14n0), so it's possible to do that, neo4j just decided to not do it that way.

OR13 commented 2 years ago

^ exactly, I suspect that with an updated graph config in neo4j the link would be imported as _:c14n0 -> _:c14n1, but its not clear what the edge should be... I think most folks would expect that edge to exist when importing a credential.

TallTed commented 2 years ago

@msporny

the data you're signing (which is in the default graph)

I fear I've missed something important along the way...

Are you saying that, in RDF Dataset Canonicalization, "the data being signed" is always in the default graph, and not in a named graph?

This is (or will be) problematic for systems (such as Virtuoso) where the default graph is the union of all named graphs (plus, at least in Virtuoso's case, a special not-really-named graph which is populated by inserts that do not specify a target named graph)...

Further, in such systems, this re-blurs the lines between "the data being signed" and "the proof data", as the named graph containing the latter is included in the default graph containing the former -- i.e., the default graph contains both the "data being signed" and "the proof data"...

dlongley commented 2 years ago

@TallTed,

Are you saying that, in RDF Dataset Canonicalization, "the data being signed" is always in the default graph, and not in a named graph?

No, this is unrelated to RDF Dataset Canonicalization.

As for Data Integrity proofs, the above separation of concerns and process may have been better described by just saying that a proof always exists in its own named graph so as to isolate it from other data.

So, whenever you create a proof (when using proof sets as opposed to proof chains), you remove any existing proof named graphs from the default graph, then sign the entire (canonicalized) dataset, then add back the existing proof named graphs and add the new proof named graph that represents the new proof to the default graph.

Does this clarify?

TallTed commented 2 years ago

@dlongley --

So, whenever you create a proof (when using proof sets as opposed to proof chains), you remove any existing proof named graphs from the default graph, then sign the entire (canonicalized) dataset, then add back the existing proof named graphs and add the new proof named graph that represents the new proof to the default graph.

"The default graph" seems not to be the correct label for all of the above instances, and even if it were, in Virtuoso (for instance), you cannot "remove any existing proof named graphs from the default graph" unless you are dropping those "existing proof named graphs" from the quad store, because all existing named graphs are part of the default graph (except when specific SPARQL clauses are used to change the definition of the default graph for that query, which does not appear to be part of the process you're describing).

sbutterfield commented 2 years ago

@dlongley,

Sorry to potentially add to the confusion. I think I follow but want to check (this also feels like we're diverging into a separate topic so I can take this elsewhere if you want):

whenever you create a proof (when using proof sets as opposed to proof chains), you remove any existing proof named graphs from the default graph, then sign the entire (canonicalized) dataset, then add back the existing proof named graphs and add the new proof named graph that represents the new proof to the default graph.

If the proof graph(s) are always decoupled during signing, then the metadata about the signature generation is not part of the signature? So, if I were to somehow gain control over the DID or become a middleman for DID resolution, then I could theoretically introduce an illegitimate signing key and alter or issue VCs for that controller to work with my illegitimate private key? » I'm sure I must have that wrong somewhere.

👇🏻 indeed

you cannot "remove any existing proof named graphs from the default graph" unless you are dropping those "existing proof named graphs" from the quad store,

dlongley commented 2 years ago

@TallTed,

"The default graph" seems not to be the correct label for all of the above instances, and even if it were, in Virtuoso (for instance), you cannot "remove any existing proof named graphs from the default graph" unless you are dropping those "existing proof named graphs" from the quad store, because all existing named graphs are part of the default graph (except when specific SPARQL clauses are used to change the definition of the default graph for that query, which does not appear to be part of the process you're describing).

+1 for finding better terminology to avoid confusion as needed.

EDIT: I presume you could implement the above using a specific SPARQL query as you mentioned (to "change the definition of the default graph") if you need to interact with the data that way via a quad store (as opposed to in memory).

dlongley commented 2 years ago

@sbutterfield,

If the proof graph(s) are always decoupled during signing, then the metadata about the signature generation is not part of the signature?

I think responding to individual concerns without a comprehensive response (i.e., what the spec says or should say) on the entire process is leading to more confusion here. But at risk of introducing more confusion in just responding to your particular query, a Data Integrity proof involves signing over a hash of both the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and over a hash of the canonicalized meta data for the new proof. In other words, all data is signed except for the signature itself (which is not logically possible to sign over since it is an output of the process).

So, if I were to somehow gain control over the DID or become a middleman for DID resolution, then I could theoretically introduce an illegitimate signing key and alter or issue VCs for that controller to work with my illegitimate private key?

The above should clarify that the answer to this is: "No".

sbutterfield commented 2 years ago

@dlongley, thank you. That's how I originally had thought about it. Crystal clear now.

TallTed commented 2 years ago

@dlongley --

a Data Integrity proof involves signing over a hash of both the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and over a hash of the canonicalized meta data for the new proof.

Still trying to parse this... It appears that the "both" is misplaced in the sentence and/or the "over a hash of both" is missing one of the things being hashed. Maybe --

a Data Integrity proof involves signing both over a hash of the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and over a hash of the canonicalized meta data for the new proof.

-- or --

a Data Integrity proof involves signing over both a hash of the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and a hash of the canonicalized meta data for the new proof.

-- or --

a Data Integrity proof involves signing over a hash of both the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and the canonicalized meta data for the new proof.

-- or something I'm not seeing yet...

dlongley commented 2 years ago

@TallTed,

The canonicalized meta data is hashed producing hash1. The canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") is hashed producing hash2. The signature is over the concatenation hash1 + hash2.

OR13 commented 2 years ago

AFAIK, the "Data Integrity Proofs" or what used to be called "Linked Data Proofs" have not changed in this regard since 2017...

Here is an example where I tested them against Mastodon:

(Mastodon is the original web5, get on my level haters).

peacekeeper commented 2 years ago

URGNA2012 (Universal RDF Graph Normalization Algorithm 2012) didn't do this as it only dealt with RDF Graphs, not RDF Datasets, and so we just shoved all the RDF signature data into the default graph (and some people were rightfully upset by that).

I was also working on LD signatures back then when the signatures/proofs still used to be in the same graph as the data, and I remember it felt like the right decision to move the signatures/proofs into their own named graphs as it is now.

VladimirAlexiev commented 2 years ago

@OR13 The example doesn't parse in rdf4j, probably because it doesn't yet support JSON-LD 1.1: https://github.com/eclipse/rdf4j/issues/3654

Jena 4.4.0 2022-01-30 also gave error

$ riot --validate test.jsonld
ERROR riot            :: invalid term definition: 1.1
$ riot --version
Jena:       VERSION: 4.4.0
Jena:       BUILD_DATE: T15:09:41Z
a :CertifiedDevice , ; _:b0 ; ; "2022-01-15T19:25:55.574Z"^^ ; . a :Device ; :description "Try to quantify the SAS alarm, maybe it will copy the virtual panel!" ; :ip "21a0:7698:a2bd:ae26:1331:085a:238a:d13d" ; :latitude "69.4264" ; :longitude "-136.3105" ; :mac "cd:22:26:65:6a:9b" ; :name "55S Mobile Program" . _:b0 { [ a ; "2022-01-15T19:25:55Z"^^ ; "eyJhbGciOiJFZERTQSIsImI2NCI6ZmFsc2UsImNyaXQiOlsiYjY0Il19..IkEae_ErY_g3-g43665vgn0KkI_A4Ww_hlDWlL0MlWVy9cddewQFT_TGFeFsqtJREf_OiNyI4ALf5oom1aPcDg" ; ; ] . } ``` @TallTed should we post an issue to SPARQL 1.2 "FROM should allow the exclusion of graphs"? Maybe no, because to fulfill the goal "separate the data you're signing", a repository would store the VC in a named graph: storing hundreds or millions of VCs in the default graph would not allow you to separate them.
TallTed commented 2 years ago

@VladimirAlexiev -- I think there are some scenarios where a NOT FROM could be useful, but I don't think signing scenarios are among them. I don't think I have a strong enough handle on an example scenario of this sort to make the case for NOT FROM in the SPARQL 1.2 wishlist, but if you do, I encourage you to add it soon, as action on items in that wishlist may be taken at any time.

OR13 commented 2 years ago

Seems related: https://github.com/search?q=org%3Aneo4j-labs+bnode%3A%2F%2F&type=code

OR13 commented 2 years ago

A simpler one liner to reproduce the issue (beware it deletes everything, so don't run this outside of a new database):

MATCH (n)
DETACH DELETE n;

DROP CONSTRAINT ON (r:Resource)
ASSERT r.uri IS UNIQUE;

CALL n10s.graphconfig.init( { handleVocabUris: 'MAP', handleRDFTypes: 'NODES' });

CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE;

CALL n10s.rdf.import.inline(
'
<https://api.did.actor/revocation-lists/1.json#0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/vc-revocation-list-2020#RevocationList2020Status> .
<https://api.did.actor/revocation-lists/1.json#0> <https://w3id.org/vc-revocation-list-2020#revocationListCredential> <https://api.did.actor/revocation-lists/1.json> .
<https://api.did.actor/revocation-lists/1.json#0> <https://w3id.org/vc-revocation-list-2020#revocationListIndex> "0"^^<http://www.w3.org/2001/XMLSchema#integer> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://www.w3.org/2018/credentials#VerifiableCredential> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://w3id.org/security#proof> _:c14n1 .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#credentialStatus> <https://api.did.actor/revocation-lists/1.json#0> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#credentialSubject> <did:example:123> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#issuanceDate> "2010-01-01T19:23:24Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#issuer> <did:key:z6MktiSzqF9kqwdU8VkdBKx56EYzXfpgnNPUAGznpicNiWfn> .
_:c14n0 <http://purl.org/dc/terms/created> "2022-06-20T16:52:58Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> _:c14n1 .
_:c14n0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/security#Ed25519Signature2018> _:c14n1 .
_:c14n0 <https://w3id.org/security#jws> "eyJhbGciOiJFZERTQSIsImI2NCI6ZmFsc2UsImNyaXQiOlsiYjY0Il19..jqpGjbIt1Hr9M5kZNzyPiTGxwm_tf2VqZiFvxIEgW31ryFyhOb_7muNwXEAzBmtL68UUQcB_dGUVfY9z978nAw" _:c14n1 .
_:c14n0 <https://w3id.org/security#proofPurpose> <https://w3id.org/security#assertionMethod> _:c14n1 .
_:c14n0 <https://w3id.org/security#verificationMethod> <did:key:z6MktiSzqF9kqwdU8VkdBKx56EYzXfpgnNPUAGznpicNiWfn#z6MktiSzqF9kqwdU8VkdBKx56EYzXfpgnNPUAGznpicNiWfn> _:c14n1 .

', 'N-Quads')

Then view the data with:

MATCH (n) RETURN n LIMIT 25
Screen Shot 2022-06-20 at 12 36 07 PM
OR13 commented 2 years ago

Here is a snippet of CQL that adds a link relationship between the proof node and "similar blank nodes"...

This is an incredibly expensive hacky work around:

MATCH
    (n0: Resource),
    (n1: Resource),
    (n2: Resource)
WHERE
    (n0)-[:proof]->(n1) AND
    apoc.text.levenshteinSimilarity(n1.uri, n2.uri) > .8 AND
    apoc.text.levenshteinSimilarity(n1.uri, n2.uri) < 1
MERGE (n1)-[link: DATA_INTEGRITY_PROOF]->(n2) 
RETURN n0, n1, n2
Screen Shot 2022-06-20 at 1 10 33 PM

After this link has been added the graphs are connected.

Screen Shot 2022-06-20 at 1 08 52 PM
OR13 commented 2 years ago

@VladimirAlexiev I had the same issue with JSON-LD v1.1 before... Its a major reason to convert from the standard JSON representation of a credential to the n-quad or framed versions... which seem to be better supported by graph databases.

I suppose the next step should be to create 3 or 4 VCs and import them all, and then look at the graph again.

I would expect to be able to see that they are "proofs for the same information", but from different actors, over time.

OR13 commented 2 years ago

A much smarter way to join the graphs after import:

MATCH
    (n1: Resource),
    (n2: Resource)
WHERE
    split(n1.uri, '-')[1] = split(n2.uri, '-')[1] AND
    NOT EXISTS(n1.jws) AND
    EXISTS(n2.jws)
MERGE (n1)-[link: DATA_INTEGRITY_PROOF]->(n2) 
RETURN n1, n2

^ this doesn't work though because of the way the blank node identifiers are assigned during a bulk import...

Screen Shot 2022-06-20 at 2 57 12 PM

In this case, 3 credentials are imported, but each has a proof with a blank node id that looks like:

uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b10
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b11
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b9

... because they were imported at the same time.... even though the credentials were issued at different times.

On the other side of the gap, we have:

uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b5
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b8
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b0

After import, we can tell they are all related by looking at 16ff0ebe17c448c0b1db6d23018428c4... but we can't tell which ones are related because of the way the information is handled when multiple credentials (each with a proof that is a @container) are handled at once.

A few thoughts:

  1. stop trying to import RDF directly, and instead transform it before importing.
  2. import RDF, but only 1 credential / object at a time... so that any blank nodes get a useful unique id.

My goal:

  1. minimize any data transformations between RDF and LPGs
  2. import VC / VP over time
  3. Import as much data as fast as possible

it seems the naive solutions to this problem are causing me to trade 1 goal for another.

OR13 commented 2 years ago

Importing objects that might contain blank nodes 1 at a time seems to work:

Screen Shot 2022-06-20 at 3 20 56 PM

Left hand side:

uri: bnode://genid-d10239de14ab4697baa44fdef3190c14-b3
uri: bnode://genid-4eb97b93909d41a19febb7483c8e49eb-b3
uri: bnode://genid-a5218ac4e96f433c8d31bb6a1115c49a-b3

Right hand side:

uri: bnode://genid-d10239de14ab4697baa44fdef3190c14-b0
uri: bnode://genid-4eb97b93909d41a19febb7483c8e49eb-b0
uri: bnode://genid-a5218ac4e96f433c8d31bb6a1115c49a-b0

It's now possible to join by looking at the middle component of the uri.

MATCH
    (credential: Resource),
    (signature: Resource)
WHERE 
    ()-[:proof]->(credential) AND
    EXISTS(signature.jws) AND
    split(credential.uri, '-')[1] = split(signature.uri, '-')[1]
MERGE (credential)-[link: DATA_INTEGRITY_PROOF]->(signature) 
RETURN credential, signature, link
Screen Shot 2022-06-20 at 3 35 49 PM

After this relationship is added:

Screen Shot 2022-06-20 at 3 36 57 PM
OR13 commented 2 years ago

Unfortunately, this won't help you with Verifiable Presentations...

Because the proofs on the credentials will have a similar blank node identifier as the proof on the presentation:

Screen Shot 2022-06-20 at 4 01 55 PM

Left:

uri: bnode://genid-83dec2dceeea4792a549afec00991790-b10
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b11
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b12
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b14
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b13

Right:

uri: bnode://genid-83dec2dceeea4792a549afec00991790-b1
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b4
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b7

Same problem as before.

The problem here is worse though... Since we also have the dangling @container from the verifiableCredential relationship:

"holder": {"@id": "cred:holder", "@type": "@id"},
"proof": {"@id": "sec:proof", "@type": "@id", "@container": "@graph"},
"verifiableCredential": {"@id": "cred:verifiableCredential", "@type": "@id", "@container": "@graph"}

I'm less sure how to fix this since:

  1. id is not required on VCs or VPs.
  2. @container is on the VC.proof and the VP.proof AND the VP.verifiableCredential relationships.

It should be possible to import the credentials individually, then the presentation, and then define relationships between them... but having to do that for every VP is going to add a LOT of overhead.

... it does work...

Screen Shot 2022-06-20 at 4 29 05 PM

After importing each item 1 at a time... the graphs for a VP can be joined:

Screen Shot 2022-06-20 at 4 31 21 PM

But I lost the vp.verifiableCredential container along the way... assuming you are lucking enough to always have an id for both VC and VP, this can be fixed at the end with:

MATCH 
    (vp { uri: 'urn:uuid:7ea1be55-fe46-443e-a0ce-eb5e40f47aaa' }),
    (vc { uri: 'urn:uuid:a96c9e16-adc3-48c7-8746-0e1b8c3535ba' })
MERGE 
    (vp)-[link: PRESENTED]->(vc) 
RETURN vc, vp, link
Screen Shot 2022-06-20 at 4 40 10 PM
TallTed commented 2 years ago

Blank nodes are extremely useful, just like other forms of pronoun. However, they are not appropriate for use in all cases; sometimes, a proper noun (a/k/a a URI, URN, IRI, such as a DID) is more appropriate. I submit that these are such cases.

OR13 commented 2 years ago

I added a similar import for VC-JWTs here https://github.com/transmute-industries/verifiable-data/pull/198

This raises interesting questions, since VC-JWT has an external proof... there is nothing to import regarding the proof semantics (without me making some custom mapping to import properties from the JWT header).

I can see benefits to both approaches... but its interesting to not that by default both LD Proofs and VC-JWT don't import the proof as connected to the credential.

iherman commented 2 years ago

The issue was discussed in a meeting on 2022-08-03

View the transcript #### 6.7. `proof` in `@context` and the use of `@container` (issue vc-data-model#881) _See github issue [vc-data-model#881](https://github.com/w3c/vc-data-model/issues/881)._ **Manu Sporny:** I think this is in the core data model - or at least in the core context.. … We could move it out in the future, but for now should stay.. **Brent Zundel:** Anyone opposed to that...?. … Taking label off..
brentzundel commented 1 year ago

blocked by #947

brentzundel commented 1 year ago

@OR13 can this issue be closed now that #910 is closed, or is there more to do to resolve it?

OR13 commented 1 year ago

I think we still need to address the graph container issue in the core data model vs the security formats.

Data Integrity side is easy, but how does this map to the proof or credential vs verifiable credential discussion.

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-04-04

View the transcript #### 1.4. `proof` in `@context` and the use of `@container` (issue vc-data-model#881) _See github issue [vc-data-model#881](https://github.com/w3c/vc-data-model/issues/881)._ **Kristina Yasuda:** By Orie. Long interaction between Orie, Dave, Manu, others.. **Dave Longley:** We could ask Orie if we can close this issue, if it's all done.. **Brent Zundel:** Will add that.
filip26 commented 1 year ago

a note: the @container: @graph causes JSON-LD 1.1 compaction algorithm to produce sec:proof instead of plain proof property name.

msporny commented 1 year ago

@filip26 wrote:

a note: the @container: @graph causes JSON-LD 1.1 compaction algorithm to produce sec:proof instead of plain proof property name.

The are a number of implementations that don't have this behavior. Can you please provide the section of the JSON-LD or VC specification that you feel triggers this behaviour?

filip26 commented 1 year ago

I don't know what step(s) in the algorithm causes the behavior, but I object that other implementations do not have this issue. This example with @graph produces sec:proof and here is the same example without @graph that produces proof.

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-04-19

View the transcript #### 2.1. `proof` in `@context` and the use of `@container` (issue vc-data-model#881) _See github issue [vc-data-model#881](https://github.com/w3c/vc-data-model/issues/881)._ **Brent Zundel:** The question is whether there is someone who is willing to be assigned. … Is there anyone who would like to be assigned?. … Or would anyone like to propose to close it?. **Orie Steele:** Some background on what it would take to close it. There's an assumption around proof is related to the container it's applied to. It's use for proof sets and proof chains in Data Integrity. … We have visuals in our spec that I think people really like that show what a proof is and how a proof goes on a VC / presentation. … Those pictures are for after applying the context. I think it would be helpful to add some details around how the process works with the graph being a separate named graph and how that relates to the pictures. … I think also that the `proof` term is not specific to Data Integrity proof, it's an extension point and we have a single implementation so far with Data Integrity Proof. … I think we should make the case better in the core spec, I think it's non-normative today. … I think that's also not super great. All of these things are all tied together. These things are all tied together, we have these visuals that are great that people like but we need to understand what the context will do to make those pictures. > *Orie Steele:* Correction, proof is normative in the core spec, but there are no normative "types" for it today.. **Brent Zundel:** Who would be willing to be assigned to the issue to help move it forward?. … Not hearing any volunteers -- issues without someone assigned to them are much less likely to see progress.
iherman commented 1 year ago

The issue was discussed in a meeting on 2023-05-17

View the transcript #### 2.1. `proof` in `@context` and the use of `@container` (issue vc-data-model#881) _See github issue [vc-data-model#881](https://github.com/w3c/vc-data-model/issues/881)._ **Brent Zundel:** First one, issue 881. … Proof in `@context` and use of `@container`. … Raised by Orie. **Orie Steele:** This came up recently in open PR about updating diagrams. … We have this RDF graphical representations of what a credential and proof graphs are. … Those diagrams reflect the same topics as this issue. … The fact that the proof is a separate box has to do with the structure of proof in the JSON-LD context. … I'd like diagrams to represent normative requirements. … Previously, we talked about defining credential, but opted not to. … Then the representations here ... and other stuff... is creating problems. … We're showing how they look after non-normative processing is happening. … The graph property is what sort of gives you that separation in those two logical boxes. … I think the best way to address this is either describing the shape of things or making @context normative. **Ivan Herman:** First of all, the fact that it is not obvious on the diagram is a problem with diagram tool. We should be careful to reverse the arguments. … I am stronger and stronger convinced that vocabulary document should be part of the standard. … In that document, it's clearly said what proof is. There are good reasons for that, which probably shouldn't be changed. … The fact that it is expressed as a container isn't relevant, that is only the tool to express the vocabulary. > *Orie Steele:* What are the reasons that proof is a seperate graph? **Brent Zundel:** remember, our goal today is to find someone to be assigned. > *Orie Steele:* You can assign me. > *Michael Prorock:* +1 ivan - big benefit to standardizing the vocab (and will force us to carefully make sure the json-ld vocab is correct). **Brent Zundel:** Thanks, Orie. … next: 915.
OR13 commented 1 year ago

This issue can be closed when https://github.com/w3c/vc-data-model/pull/1158 is merged

iherman commented 1 year ago

The issue was discussed in a meeting on 2023-06-28

View the transcript #### 2.8. `proof` in `@context` and the use of `@container` (issue vc-data-model#881) _See github issue [vc-data-model#881](https://github.com/w3c/vc-data-model/issues/881)._ **Brent Zundel:** #881 raised and assigned to Orie. **Orie Steele:** Similar comment to normative context. Impact is on the shape of the graphs. Related to what happens if the hash doesn't match? … This property impacts how the resolution of the graph works. Resolving normative context PR should solve this. … This #881 - comes from certain JSON-LD key words which are normative. Different n-quads produced when normalized depending on what key words are in the context file. These will produce different n-quads and force failure of signatures. **Ivan Herman:** don't understand the clarification. … Only problem seen is if you want nice graphs you can't do it because Neo4j is prepared for it. Proof has always been something that produced a graph. > *Manu Sporny:* It's not clear what problem you're highlighting, Orie. **Orie Steele:** Neo4j is not the problem - the problem is the context being normative. If you process data with different contexts which you could do before but normative changes will prevent this. **Ivan Herman:** still don't understand this. Know what proofs do and how sigs work. Proof is always a property of the graph. **Manu Sporny:** doesn't see the problem either. Perhaps the issuer signs something and the verifier uses a different context the sig will not verify. That's a security feature. > *Dave Longley:* it's normative in the spec already in [https://w3c.github.io/vc-data-model/#syntactic-sugar](https://w3c.github.io/vc-data-model/#syntactic-sugar). **Orie Steele:** Seems like this is perceived as a problem but it's the way digital signatures work. That's always been the design problem but it's not clear what that security issue, if signed bytes change the security will fail. > *Manu Sporny:* Huh, ok? I still don't understand the problem... but I'm hearing we can close this once we merge the normative context thing. **Orie Steele:** This issue can be closed when the pull request for normative nature of the context terms is merged. > *Dave Longley:* btw, `proof` and `verifiableCredential` as graphs was already normative via the above link (even before making the context normative). > *Orie Steele:* You cannot assume you know what bytes will be signed unless the context value is normative, and issuers and verifiers use the same structure.
OR13 commented 1 year ago

@iherman on the call today, you asserted that the current JSON-LD context behavior wrt proof is correct.

I wanted to share some implementation experience with the working group on applying the current proof graphs, as the are generated with the current normative contexts, when converting from JSON-LD to RDF.

It is true that when importing a graph for a application/vc+ld+json you get (at least) 2 disconnected graphs, one for the credential and one for each proof (in the case that proof was present).

This behavior was previously ambiguous, but will now be consistent thanks to making the context normative.

It impacts if software systems will process these data models as RDF graphs.

Regardless of what the context says the RDF should be, a graph processing verifier might decide to attach proofs to credentials, or credentials to presentations, in order to generate more efficient graph queries.

At Transmute, we've obviously been using neo4j a lot, as have a lot of companies that are interested in modern graph APIs and moving beyond just doing what RDF allows (especially while we wait to see what RDF-star will allow).

Here is a link to a tool we use to evaluate JSON-LD DIDs and VCs:

https://github.com/transmute-industries/transmute

Here is a link to an open source US Customs program that also uses neo4j:

https://github.com/US-CBP/GTAS

While I personally don't agree with the RDF graph that is now normative, as you can see, I am comfortable working around its flaws to produce graphs that preserve the relationships we see in JSON, specifically the relationships between proof, credential and presentation.

I think most folks will be surprised to learn that while proof is always OPTIONAL in JSON-LD VC... It is NEVER present after you convert to RDF.

Similarly, folks will probably be surprised to learn that a verifiable presentation will not contain credentials when imported for the same reason, and that its proof will also be treated the same way.

This causes graph processors to "forget where things came from" after importing JSON-LD as RDF.

I find this behavior undesirable, but obviously we can work around it, and now our work around will be consistent, thanks to making the @context values and specifically these parts, normative:

As I said on the call, this issue predates the working groups intelligent decision to make the context normative, and this issue can be closed when this PR is merged:

https://github.com/w3c/vc-data-model/pull/1158

iherman commented 1 year ago

This issue can be closed when #1158 is merged

The PR is merged, I presume this issue is now moot and can be set as pending close. @brentzundel @Sakurann @OR13 ?

OR13 commented 1 year ago

Indeed!

Consumers of verifiable credentials as RDF are now assured of a specific graph structure, by the application of our normative context.

This makes extension or translation in a reliable manner possible.

This issue should be closed.

brentzundel commented 1 year ago

This issue has been addressed, closing.