Closed OR13 closed 1 year ago
Relevant sections of the JSON-LD TR:
implicitly named graph A named graph created from the value of a map entry having an expanded term definition where
@container
is set to@graph
.
https://www.w3.org/TR/json-ld11/#graph-containers
When expanded, these become simple graph objects.
^ pretty sure this is the culprit... it means that if you expand a credential, you loose the relationship between the credential and its proof.
"proof": {"@id": "sec:proof", "@type": "@id", "@container": "@graph"},
This states that proof
will be contained in a separate graph than the default graph. RDF Dataset Canonicalization does this to separate the data you're signing (which is in the default graph) from the proof data (which is in a different graph). Both graphs together constitute an RDF Dataset and both items are signed over when generating a Data Integrity signature . We did this to ensure that the signature graph didn't pollute the "data being signed" graph.
URGNA2012 (Universal RDF Graph Normalization Algorithm 2012) didn't do this as it only dealt with RDF Graphs, not RDF Datasets, and so we just shoved all the RDF signature data into the default graph (and some people were rightfully upset by that).
When the RDF 1.1 work expanded to include RDF Datasets (part of the driver there was to support concepts that JSON-LD supported but the core RDF data model at the time didn't support), we separated the "data to be signed" from the "signature information" to ensure a cleaner separation between the two types of data. That became the URDNA2015 (Universal RDF Dataset Canonicalization Algorithm 2015).
Hopefully the benefits of this architectural separation between original data and signature data are clear... if they're not, I'm happy to try and elaborate on how jumbling "data to be signed" with "the signature" leads to dirty data over time, especially when you shove it into / take it out of graph databases.
As for what neo4j is doing there... you might ask them how they link statements between RDF Graphs in an RDF Dataset... might just be a limitation on their tooling. The JSON-LD Playground doesn't seem to suffer from the same limitation.
@OR13 can you give the JSONLD you use to make that neo4j graph? I'll try it in GraphDB.
@msporny thanks, I figured that is what was happening.
^ these are the URIs that neo4j assigns to the blank nodes (based on a default graph config):
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE
...
CALL n10s.graphconfig.init({
handleVocabUris: 'MAP'
})
... so it is possible to query over the gap, between the graphs, you just have to do some string magic.
these are the URIs that neo4j assigns to the blank nodes
Hrm, that feels a bit weird. It looks like they're sharing some part of the bnode ID space, but then tacking something on at the end (-b1
and -b6
) to give them different IDs. We'd have to talk with their core engineering team to understand why they decided to do it that way vs. just use a universal bnode space for graph names in a dataset.
re: https://v.jsld.org/ -- that's a neat visualization tool :)
Note that the text/x-nquads
output shares the same namespace for blank nodes (_:c14n1
) and graph names (_:c14n0
), so it's possible to do that, neo4j just decided to not do it that way.
^ exactly, I suspect that with an updated graph config in neo4j the link would be imported as _:c14n0
-> _:c14n1
, but its not clear what the edge should be... I think most folks would expect that edge to exist when importing a credential.
@msporny
the data you're signing (which is in the default graph)
I fear I've missed something important along the way...
Are you saying that, in RDF Dataset Canonicalization, "the data being signed" is always in the default graph, and not in a named graph?
This is (or will be) problematic for systems (such as Virtuoso) where the default graph is the union of all named graphs (plus, at least in Virtuoso's case, a special not-really-named graph which is populated by inserts that do not specify a target named graph)...
Further, in such systems, this re-blurs the lines between "the data being signed" and "the proof data", as the named graph containing the latter is included in the default graph containing the former -- i.e., the default graph contains both the "data being signed" and "the proof data"...
@TallTed,
Are you saying that, in RDF Dataset Canonicalization, "the data being signed" is always in the default graph, and not in a named graph?
No, this is unrelated to RDF Dataset Canonicalization.
As for Data Integrity proofs, the above separation of concerns and process may have been better described by just saying that a proof always exists in its own named graph so as to isolate it from other data.
So, whenever you create a proof (when using proof sets as opposed to proof chains), you remove any existing proof named graphs from the default graph, then sign the entire (canonicalized) dataset, then add back the existing proof named graphs and add the new proof named graph that represents the new proof to the default graph.
Does this clarify?
@dlongley --
So, whenever you create a proof (when using proof sets as opposed to proof chains), you remove any existing proof named graphs from the default graph, then sign the entire (canonicalized) dataset, then add back the existing proof named graphs and add the new proof named graph that represents the new proof to the default graph.
"The default graph" seems not to be the correct label for all of the above instances, and even if it were, in Virtuoso (for instance), you cannot "remove any existing proof named graphs from the default graph" unless you are dropping those "existing proof named graphs" from the quad store, because all existing named graphs are part of the default graph (except when specific SPARQL clauses are used to change the definition of the default graph for that query, which does not appear to be part of the process you're describing).
@dlongley,
Sorry to potentially add to the confusion. I think I follow but want to check (this also feels like we're diverging into a separate topic so I can take this elsewhere if you want):
whenever you create a proof (when using proof sets as opposed to proof chains), you remove any existing proof named graphs from the default graph, then sign the entire (canonicalized) dataset, then add back the existing proof named graphs and add the new proof named graph that represents the new proof to the default graph.
If the proof graph(s) are always decoupled during signing, then the metadata about the signature generation is not part of the signature? So, if I were to somehow gain control over the DID or become a middleman for DID resolution, then I could theoretically introduce an illegitimate signing key and alter or issue VCs for that controller to work with my illegitimate private key? » I'm sure I must have that wrong somewhere.
👇🏻 indeed
you cannot "remove any existing proof named graphs from the default graph" unless you are dropping those "existing proof named graphs" from the quad store,
@TallTed,
"The default graph" seems not to be the correct label for all of the above instances, and even if it were, in Virtuoso (for instance), you cannot "remove any existing proof named graphs from the default graph" unless you are dropping those "existing proof named graphs" from the quad store, because all existing named graphs are part of the default graph (except when specific SPARQL clauses are used to change the definition of the default graph for that query, which does not appear to be part of the process you're describing).
+1 for finding better terminology to avoid confusion as needed.
EDIT: I presume you could implement the above using a specific SPARQL query as you mentioned (to "change the definition of the default graph") if you need to interact with the data that way via a quad store (as opposed to in memory).
@sbutterfield,
If the proof graph(s) are always decoupled during signing, then the metadata about the signature generation is not part of the signature?
I think responding to individual concerns without a comprehensive response (i.e., what the spec says or should say) on the entire process is leading to more confusion here. But at risk of introducing more confusion in just responding to your particular query, a Data Integrity proof involves signing over a hash of both the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and over a hash of the canonicalized meta data for the new proof. In other words, all data is signed except for the signature itself (which is not logically possible to sign over since it is an output of the process).
So, if I were to somehow gain control over the DID or become a middleman for DID resolution, then I could theoretically introduce an illegitimate signing key and alter or issue VCs for that controller to work with my illegitimate private key?
The above should clarify that the answer to this is: "No".
@dlongley, thank you. That's how I originally had thought about it. Crystal clear now.
@dlongley --
a Data Integrity proof involves signing over a hash of both the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and over a hash of the canonicalized meta data for the new proof.
Still trying to parse this... It appears that the "both" is misplaced in the sentence and/or the "over a hash of both" is missing one of the things being hashed. Maybe --
a Data Integrity proof involves signing both over a hash of the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and over a hash of the canonicalized meta data for the new proof.
-- or --
a Data Integrity proof involves signing over both a hash of the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and a hash of the canonicalized meta data for the new proof.
-- or --
a Data Integrity proof involves signing over a hash of both the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and the canonicalized meta data for the new proof.
-- or something I'm not seeing yet...
@TallTed,
The canonicalized meta data is hashed producing hash1
. The canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") is hashed producing hash2
. The signature is over the concatenation hash1 + hash2
.
AFAIK, the "Data Integrity Proofs" or what used to be called "Linked Data Proofs" have not changed in this regard since 2017...
Here is an example where I tested them against Mastodon:
(Mastodon is the original web5, get on my level haters).
URGNA2012 (Universal RDF Graph Normalization Algorithm 2012) didn't do this as it only dealt with RDF Graphs, not RDF Datasets, and so we just shoved all the RDF signature data into the default graph (and some people were rightfully upset by that).
I was also working on LD signatures back then when the signatures/proofs still used to be in the same graph as the data, and I remember it felt like the right decision to move the signatures/proofs into their own named graphs as it is now.
@OR13 The example doesn't parse in rdf4j, probably because it doesn't yet support JSON-LD 1.1: https://github.com/eclipse/rdf4j/issues/3654
Jena 4.4.0 2022-01-30 also gave error
$ riot --validate test.jsonld
ERROR riot :: invalid term definition: 1.1
$ riot --version
Jena: VERSION: 4.4.0
Jena: BUILD_DATE: T15:09:41Z
$ riot --formatted trig test.jsonld
@prefix : <https://ontology.example/vocab/#> .
@VladimirAlexiev -- I think there are some scenarios where a NOT FROM
could be useful, but I don't think signing scenarios are among them. I don't think I have a strong enough handle on an example scenario of this sort to make the case for NOT FROM
in the SPARQL 1.2 wishlist, but if you do, I encourage you to add it soon, as action on items in that wishlist may be taken at any time.
A simpler one liner to reproduce the issue (beware it deletes everything, so don't run this outside of a new database):
MATCH (n)
DETACH DELETE n;
DROP CONSTRAINT ON (r:Resource)
ASSERT r.uri IS UNIQUE;
CALL n10s.graphconfig.init( { handleVocabUris: 'MAP', handleRDFTypes: 'NODES' });
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE;
CALL n10s.rdf.import.inline(
'
<https://api.did.actor/revocation-lists/1.json#0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/vc-revocation-list-2020#RevocationList2020Status> .
<https://api.did.actor/revocation-lists/1.json#0> <https://w3id.org/vc-revocation-list-2020#revocationListCredential> <https://api.did.actor/revocation-lists/1.json> .
<https://api.did.actor/revocation-lists/1.json#0> <https://w3id.org/vc-revocation-list-2020#revocationListIndex> "0"^^<http://www.w3.org/2001/XMLSchema#integer> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://www.w3.org/2018/credentials#VerifiableCredential> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://w3id.org/security#proof> _:c14n1 .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#credentialStatus> <https://api.did.actor/revocation-lists/1.json#0> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#credentialSubject> <did:example:123> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#issuanceDate> "2010-01-01T19:23:24Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
<urn:uuid:37a64932-49cf-4afd-8c5e-ced22f87d835> <https://www.w3.org/2018/credentials#issuer> <did:key:z6MktiSzqF9kqwdU8VkdBKx56EYzXfpgnNPUAGznpicNiWfn> .
_:c14n0 <http://purl.org/dc/terms/created> "2022-06-20T16:52:58Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> _:c14n1 .
_:c14n0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/security#Ed25519Signature2018> _:c14n1 .
_:c14n0 <https://w3id.org/security#jws> "eyJhbGciOiJFZERTQSIsImI2NCI6ZmFsc2UsImNyaXQiOlsiYjY0Il19..jqpGjbIt1Hr9M5kZNzyPiTGxwm_tf2VqZiFvxIEgW31ryFyhOb_7muNwXEAzBmtL68UUQcB_dGUVfY9z978nAw" _:c14n1 .
_:c14n0 <https://w3id.org/security#proofPurpose> <https://w3id.org/security#assertionMethod> _:c14n1 .
_:c14n0 <https://w3id.org/security#verificationMethod> <did:key:z6MktiSzqF9kqwdU8VkdBKx56EYzXfpgnNPUAGznpicNiWfn#z6MktiSzqF9kqwdU8VkdBKx56EYzXfpgnNPUAGznpicNiWfn> _:c14n1 .
', 'N-Quads')
Then view the data with:
MATCH (n) RETURN n LIMIT 25
Here is a snippet of CQL that adds a link relationship between the proof node and "similar blank nodes"...
This is an incredibly expensive hacky work around:
MATCH
(n0: Resource),
(n1: Resource),
(n2: Resource)
WHERE
(n0)-[:proof]->(n1) AND
apoc.text.levenshteinSimilarity(n1.uri, n2.uri) > .8 AND
apoc.text.levenshteinSimilarity(n1.uri, n2.uri) < 1
MERGE (n1)-[link: DATA_INTEGRITY_PROOF]->(n2)
RETURN n0, n1, n2
After this link has been added the graphs are connected.
@VladimirAlexiev I had the same issue with JSON-LD v1.1 before... Its a major reason to convert from the standard JSON representation of a credential to the n-quad or framed versions... which seem to be better supported by graph databases.
I suppose the next step should be to create 3 or 4 VCs and import them all, and then look at the graph again.
I would expect to be able to see that they are "proofs for the same information", but from different actors, over time.
A much smarter way to join the graphs after import:
MATCH
(n1: Resource),
(n2: Resource)
WHERE
split(n1.uri, '-')[1] = split(n2.uri, '-')[1] AND
NOT EXISTS(n1.jws) AND
EXISTS(n2.jws)
MERGE (n1)-[link: DATA_INTEGRITY_PROOF]->(n2)
RETURN n1, n2
^ this doesn't work though because of the way the blank node identifiers are assigned during a bulk import...
In this case, 3 credentials are imported, but each has a proof with a blank node id that looks like:
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b10
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b11
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b9
... because they were imported at the same time.... even though the credentials were issued at different times.
On the other side of the gap, we have:
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b5
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b8
uri: bnode://genid-16ff0ebe17c448c0b1db6d23018428c4-b0
After import, we can tell they are all related by looking at 16ff0ebe17c448c0b1db6d23018428c4
... but we can't tell which ones are related because of the way the information is handled when multiple credentials (each with a proof
that is a @container
) are handled at once.
A few thoughts:
My goal:
it seems the naive solutions to this problem are causing me to trade 1 goal for another.
Importing objects that might contain blank nodes 1 at a time seems to work:
Left hand side:
uri: bnode://genid-d10239de14ab4697baa44fdef3190c14-b3
uri: bnode://genid-4eb97b93909d41a19febb7483c8e49eb-b3
uri: bnode://genid-a5218ac4e96f433c8d31bb6a1115c49a-b3
Right hand side:
uri: bnode://genid-d10239de14ab4697baa44fdef3190c14-b0
uri: bnode://genid-4eb97b93909d41a19febb7483c8e49eb-b0
uri: bnode://genid-a5218ac4e96f433c8d31bb6a1115c49a-b0
It's now possible to join by looking at the middle component of the uri
.
MATCH
(credential: Resource),
(signature: Resource)
WHERE
()-[:proof]->(credential) AND
EXISTS(signature.jws) AND
split(credential.uri, '-')[1] = split(signature.uri, '-')[1]
MERGE (credential)-[link: DATA_INTEGRITY_PROOF]->(signature)
RETURN credential, signature, link
After this relationship is added:
Unfortunately, this won't help you with Verifiable Presentations...
Because the proofs on the credentials will have a similar blank node identifier as the proof on the presentation:
Left:
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b10
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b11
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b12
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b14
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b13
Right:
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b1
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b4
uri: bnode://genid-83dec2dceeea4792a549afec00991790-b7
Same problem as before.
The problem here is worse though... Since we also have the dangling @container
from the verifiableCredential
relationship:
"holder": {"@id": "cred:holder", "@type": "@id"},
"proof": {"@id": "sec:proof", "@type": "@id", "@container": "@graph"},
"verifiableCredential": {"@id": "cred:verifiableCredential", "@type": "@id", "@container": "@graph"}
I'm less sure how to fix this since:
id
is not required on VCs or VPs.@container
is on the VC.proof
and the VP.proof
AND the VP.verifiableCredential
relationships.It should be possible to import the credentials individually, then the presentation, and then define relationships between them... but having to do that for every VP is going to add a LOT of overhead.
... it does work...
After importing each item 1 at a time... the graphs for a VP can be joined:
But I lost the vp.verifiableCredential
container along the way... assuming you are lucking enough to always have an id
for both VC and VP, this can be fixed at the end with:
MATCH
(vp { uri: 'urn:uuid:7ea1be55-fe46-443e-a0ce-eb5e40f47aaa' }),
(vc { uri: 'urn:uuid:a96c9e16-adc3-48c7-8746-0e1b8c3535ba' })
MERGE
(vp)-[link: PRESENTED]->(vc)
RETURN vc, vp, link
Blank nodes are extremely useful, just like other forms of pronoun. However, they are not appropriate for use in all cases; sometimes, a proper noun (a/k/a a URI, URN, IRI, such as a DID) is more appropriate. I submit that these are such cases.
I added a similar import for VC-JWTs here https://github.com/transmute-industries/verifiable-data/pull/198
This raises interesting questions, since VC-JWT has an external
proof... there is nothing to import regarding the proof semantics (without me making some custom mapping to import properties from the JWT header
).
I can see benefits to both approaches... but its interesting to not that by default both LD Proofs and VC-JWT don't import the proof as connected to the credential.
The issue was discussed in a meeting on 2022-08-03
blocked by #947
@OR13 can this issue be closed now that #910 is closed, or is there more to do to resolve it?
I think we still need to address the graph container issue in the core data model vs the security formats.
Data Integrity side is easy, but how does this map to the proof
or credential
vs verifiable credential
discussion.
The issue was discussed in a meeting on 2023-04-04
a note: the @container: @graph
causes JSON-LD 1.1 compaction algorithm to produce sec:proof
instead of plain proof
property name.
@filip26 wrote:
a note: the
@container: @graph
causes JSON-LD 1.1 compaction algorithm to producesec:proof
instead of plainproof
property name.
The are a number of implementations that don't have this behavior. Can you please provide the section of the JSON-LD or VC specification that you feel triggers this behaviour?
I don't know what step(s) in the algorithm causes the behavior, but I object that other implementations do not have this issue. This example with @graph produces sec:proof
and here is the same example without @graph that produces proof
.
The issue was discussed in a meeting on 2023-04-19
The issue was discussed in a meeting on 2023-05-17
This issue can be closed when https://github.com/w3c/vc-data-model/pull/1158 is merged
The issue was discussed in a meeting on 2023-06-28
@iherman on the call today, you asserted that the current JSON-LD context behavior wrt proof is correct.
I wanted to share some implementation experience with the working group on applying the current proof graphs, as the are generated with the current normative contexts, when converting from JSON-LD to RDF.
It is true that when importing a graph for a application/vc+ld+json
you get (at least) 2 disconnected graphs, one for the credential and one for each proof
(in the case that proof was present).
This behavior was previously ambiguous, but will now be consistent thanks to making the context normative.
It impacts if software systems will process these data models as RDF graphs.
Regardless of what the context says the RDF should be, a graph processing verifier might decide to attach proofs to credentials, or credentials to presentations, in order to generate more efficient graph queries.
At Transmute, we've obviously been using neo4j a lot, as have a lot of companies that are interested in modern graph APIs and moving beyond just doing what RDF allows (especially while we wait to see what RDF-star will allow).
Here is a link to a tool we use to evaluate JSON-LD DIDs and VCs:
https://github.com/transmute-industries/transmute
Here is a link to an open source US Customs program that also uses neo4j:
https://github.com/US-CBP/GTAS
While I personally don't agree with the RDF graph that is now normative, as you can see, I am comfortable working around its flaws to produce graphs that preserve the relationships we see in JSON, specifically the relationships between proof
, credential
and presentation
.
I think most folks will be surprised to learn that while proof
is always OPTIONAL in JSON-LD VC...
It is NEVER present after you convert to RDF.
Similarly, folks will probably be surprised to learn that a verifiable presentation will not contain credentials when imported for the same reason, and that its proof will also be treated the same way.
This causes graph processors to "forget where things came from" after importing JSON-LD as RDF.
I find this behavior undesirable, but obviously we can work around it, and now our work around will be consistent, thanks to making the @context
values and specifically these parts, normative:
As I said on the call, this issue predates the working groups intelligent decision to make the context normative, and this issue can be closed when this PR is merged:
This issue can be closed when #1158 is merged
The PR is merged, I presume this issue is now moot and can be set as pending close. @brentzundel @Sakurann @OR13 ?
Indeed!
Consumers of verifiable credentials as RDF are now assured of a specific graph structure, by the application of our normative context.
This makes extension or translation in a reliable manner possible.
This issue should be closed.
This issue has been addressed, closing.
I've been using Neo4j a lot lately.
One of my favorite features is the ability to preview (framed) JSON-LD.
For example:
For simple cases this works fine... but when I attempt to apply this to spec compliant verifiable credentials, I get a weird blank node issue with the proof block.
Here is a picture of what I mean:
Notice the 2 blank nodes that separate these disjoint subgraphs.
I believe this is caused by the way the
proof
block is defined in the v1 context:https://github.com/w3c/vc-data-model/blob/v1.1/contexts/credentials/v1#L45
This is a lot of complexity... for one of the most important term definitions the standard provides.
I believe this is also the cause of the "double blank node" issue, I observed above.
I think what happens is that a first blank node is created for the proof, and since that node has
@container
@graph
, instead of being able to trace the relationships directly from credential to proof to verification method...Each proof is being treated as a disjoint subgraph, and the relationship is not being preserved during preview / import...
This is really not ideal, since I am interested in querying changes in these proofs over time for credentials, and that relationship is not being imported.
I suspect this is solvable with a more complicated graph config: https://neo4j.com/labs/neosemantics/4.0/config/
But I wonder if we might correct this behavior in VC Data Model 2.0, such that RDF representations don't have this odd behavior when imported as labeled property graphs.
Anyone know how to solve this?