w3c / vc-data-integrity

W3C Data Integrity Specification
https://w3c.github.io/vc-data-integrity/
Other
39 stars 17 forks source link

Add explanation about proof graphs #270

Closed msporny closed 2 weeks ago

msporny commented 3 weeks ago

This PR is an attempt to address issue #190 by adding an explanation about why proof graphs are used.


Preview | Diff

msporny commented 3 weeks ago

@iherman I'll note that we're duplicating terminology between VCDM and this spec in this PR, but I have to get the exports working in this spec to re-direct the VCDM terminology references to this specification. All that to say, there will be some duplication until this PR is merged and the VCDM adjustments are made. Also note that the concept of a ProofGraph has been added, where we might need a reference added to the Security vocabulary?

iherman commented 3 weeks ago

@iherman I'll note that we're duplicating terminology between VCDM and this spec in this PR, but I have to get the exports working in this spec to re-direct the VCDM terminology references to this specification. All that to say, there will be some duplication until this PR is merged and the VCDM adjustments are made.

Understood.

But the current text is, I believe, wrong ๐Ÿ˜’. It says:

When securing a document, it is important to clearly delinate the data being protected, called the default graph,โ€ฆ

That, in general, is not (necessarily) true. It is only true for simple Credentials (or similar tree-formed linked data without the usage of named graphs of their own), when the set of claims to be secured happens to be the default graph. But this is not the case for a Verifiable Presentation, whose default graph only contains the claims for the presentation as a whole, the credential within the presentation is a separate (named) graph, and then there are proof graphs of that credential that must also be proven by the presentation's proof graph. See figure 9 in the VC spec. The relevant section in the VCDM describes the mechanism more precisely.

If we want to formulate it in more general terms in the DI spec, we may be on a slippery slope. The DI spec, so far, does not say how one designates the "subject dataset" to be secured by a proof. It is formally defined for VCs and VPs in the aforementioned section but, in general, we do not have a standard mechanism. (This was the essence of the discussion with the late Henry Story.)

I am not sure how you want to play this at this point. One approach is to say in the DI spec, that any application area have to provide their own definition on what exactly is secured by a ProofGraph. We can refer back (probably within a note, not to make it normative) to the aforementioned section for the VC and VP cases.

Alternatively, we move the whole definition here from the VCDM, but this then becomes a major surgery, because most of the diagrams (and their accompanying texts) in ยง3 would become more difficult to understand, and may have to be moved to the DI spec...

Also note that the concept of a ProofGraph has been added, where we might need a reference added to the Security vocabulary?

I am not sure what you mean. The security vocabulary already has the notion of a ProofGraph. Do you mean that the definition of that vocabulary (which is very terse) should refer to this new section? Then yes, this is to be done, possibly as part of this PR. I am happy to do it.

msporny commented 2 weeks ago

@iherman wrote:

The security vocabulary already has the notion of a ProofGraph.

Excellent! I forgot that existed. :) -- (I'm sure you're seeing a pattern for today, Ivan -- my head is clearly not screwed on straight today)

But the current text is, I believe, wrong ๐Ÿ˜’.

Yes, you're right... let me try to see if I can do something in DI that doesn't require the major surgery you mention above.

msporny commented 2 weeks ago

we do not have a standard mechanism. (This was the essence of the discussion with the late Henry Story.) One approach is to say in the DI spec, that any application area have to provide their own definition on what exactly is secured by a ProofGraph.

@iherman, I have made an attempt at a fix in 8544fa74b77552e5f31a2f46b1f8967656c935e0 that defines the "default graph" (as far as DI is concerned) as "For an unsecured document, the information contained in the document before a [=data integrity proof=] is added to the document". So, we don't run afoul of where the boundary is (the boundary is the document and goes no further than that, avoiding Henry and Dan's concerns, IIRC)... and what is contained (it's everything in the document before it was secured, which might include references to other named graphs, which are also secured).

Does that formulation work for you? If not, please suggest some text that might fix it. For now, ignore what we say in VCDM and see if this text can stand on its own... if it can, perhaps we can add text that says "other specs using DI can more accurately define what the default graph entails and what the proof graph secures. Thoughts?

iherman commented 2 weeks ago

@iherman, I have made an attempt at a fix in 8544fa7 that defines the "default graph" (as far as DI is concerned) as "For an unsecured document, the information contained in the document before a [=data integrity proof=] is added to the document". So, we don't run afoul of where the boundary is (the boundary is the document and goes no further than that, avoiding Henry and Dan's concerns, IIRC)... and what is contained (it's everything in the document before it was secured, which might include references to other named graphs, which are also secured).

Does that formulation work for you?

Almost, but not fully. The whole section could work, except for the term "default graph". It does not, because:

The term "default graph" is formally defined in the RDF specification. We should not redefine that term, and we should refer to it if and only if the reference is correct. (We were very cautious in the VCDM spec in this respect!) And the problem is that, in the case of a Verifiable Presentation, that "thing" is not the default graph but, rather, an (RDF) dataset.

So the text works for me, but only if we use a different term.

I must admit, I am not sure what the right term could be. AFAIK, the crypto people sometimes use the term "plaintext" for the "thing" that is being signed, so maybe something like the "plain graph" may work (I am holding my nose over the fact that it is not a graph...)? Or, if I forget about holding my nose, "plain graphs" (knowing that the plural is not necessarily justified)? But that is just an idea, I trust you will find a better term...

iherman commented 2 weeks ago

To be clear, the proposed changes by @dlongley (in https://github.com/w3c/vc-data-integrity/pull/270#discussion_r1659102575 and https://github.com/w3c/vc-data-integrity/pull/270#discussion_r1659107254) solve my problem.

Thanks @dlongley!

msporny commented 2 weeks ago

Editorial, multiple reviews, changes requested and made, no objections, merging.