w3c / vc-data-integrity

W3C Data Integrity Specification
https://w3c.github.io/vc-data-integrity/
Other
40 stars 18 forks source link

Reconsider `@context` injection feature based on implementer feedback #281

Open msporny opened 3 days ago

msporny commented 3 days ago

After performing an implementation, and working through a PR, @rflechtner has proposed that we remove @context injection feature from the Data Integrity specification in this comment:

https://github.com/w3c/vc-data-integrity/issues/231#issuecomment-2209299341

The feature was added based on feedback that "implementers should be able to inject a context". The current text doesn't quite do that; it's a bit of a half-measure, and after some discussion, there does not seem to be support to continue to support the injection feature in its current form (or at all).

This issue is meant to track that concern and decide what to do. Among the options to consider are:

  1. Normatively, continue to support context injection, and provide much more clear rules on how to support it across all cryptosuites.
  2. Normatively, do not support context injection at all.
  3. Normatively, do not prohibit context injection, but informatively note that its up to applications to do it after they have carefully considered the ramifications.
jandrieu commented 3 days ago

Can we get clarification on what "context injection" is?

rflechtner commented 3 days ago

Can we get clarification on what "context injection" is?

Context injection is described in https://www.w3.org/TR/vc-data-integrity/#context-injection and refers to amending the @context property of the secured document with a context value that maps the terms contained in a proof. In practice, this means that the document is altered in the process of proof creation - or even of proof verification, the specification is not entirely clear here. My assumption is that this practice was initially introduced under the assumption that the @context property is not secured by the proof (see https://github.com/w3c/vc-data-integrity/issues/272 where this has been discussed) and thus can be manipulated without negative consequences, an assumption that however does not hold true when using cryptosuites that treat the document as regular JSON, e.g. those using the JCS transformation (such as eddsa-jcs-2022).

We've tried addressing the issue by means of a workaround on JCS cryptosuites (https://github.com/w3c/vc-data-integrity/issues/225), but as I outlined in https://github.com/w3c/vc-data-integrity/issues/231 there are more accommodations that would need to be made to make sure that JCS-based proofs can reliably be verified. These accommodations, if they are even possible, need to be very carefully designed to make sure that not only reliably verification is guaranteed, but also an effective protection of the semantics expressed in @context values originally added by the issuer.

These problems uncover that the praxis of context injection leads to an unnecessarily tight coupling to JSON-LD. As I tried to describe in https://github.com/w3c/vc-data-integrity/issues/231#issuecomment-2211027752, abandoning context injection (the practical value of which is unclear) means that Linked Data becomes an additional semantic layer that can be added to a document, but is not strictly required when using cryptosuites such as eddsa-jcs-2022.

rflechtner commented 3 days ago

I'll try to briefly summarise the arguments for and against abandoning context injection as a feature of this specification as far as I see them.

Reasons to abandon context injection:

Reasons against abandoning context injection:

msporny commented 3 days ago

@rflechtner, some rough thoughts on potential normative language that we could use/add -- I haven't thought through each one in detail (some of them might be unworkable):

  1. Injection MUST happen at the application layer, that's the only place it makes sense to do it.
  2. The application layer MAY inject stuff BEFORE it adds ANY proof.
  3. The application layer MUST/SHOULD NOT inject stuff before it it verifies a proof.
  4. If the application layer injects a context, it MUST ensure that at least the data-integrity/v2 terms are a part of that injection (and maybe this doesn't apply to JCS)?
rflechtner commented 2 days ago

some rough thoughts on potential normative language that we could use/add

1-3 make perfect sense to me.

If the application layer injects a context, it MUST ensure that at least the data-integrity/v2 terms are a part of that injection

This moves into the territory of what I referred to as 'alternative measures to enforce that proof terms can be mapped to Linked Data', and thus depends on the preferred solution. What you sketched out is one possible solution that does seem workable, but requires that the application logic that puts together the document to be secured considers the semantics of the proof that's added by a DI implementation.

I currently see roughly 3 possible types of solutions of which we may pick one:

Personally, I'm currently leaning towards preferring the second solution where the context mapping proof-related term is added to the proof object instead, simply for the separation of concerns that this affords: let the application logic worry about the semantics of the document (if it wants to deal with JSON-LD at all) and the DI implementation worry about enforcing semantics for its terms, without the two interacting. So far this seems workable to me, but it's also not 100% thought out.

In normative language, this could read as follows: During proof creation, implementation MUST ensure the correct mapping of all proof terms by injecting the context value https://w3id.org/security/data-integrity/v2 to the @context property of the proof object. This injection MAY be omitted if the @context value of the document already provides an equivalent mapping, e.g., when it uses the Verifiable Credential Data Model v2.0 context (https://www.w3.org/ns/credentials/v2).

veikkoeeva commented 2 days ago

Let DI implementations enforce semantics of proof terms explicitly by including a @context property on the proof object that includes the value https://w3id.org/security/data-integrity/v2. This could be made conditional of course, such that this @context value is only added if the document doesn't already map all the terms correctly.

From an implementation perspective, considering the discussion this links to, this probably requires a bit more beefed up step-by-step algorithm with e.g. an example of a starting document showing trasformations and what pieces of it are signed and so on. I tell this also because I indeed implemented one version with .NET and it took some time to figure out that one version that seemed to work, even if I had "matters of confusion" (such as this injection discussed here). Some of the implementation issues I think I have in .NET were just recently like https://github.com/dotnetrdf/dotnetrdf/issues/615 (the official version having this was released just some weeks ago, see also https://github.com/dotnetrdf/dotnetrdf/issues/631) and so the simpler the better and easier to review and verify the individual given library used steps the better. .NET also does not have native JCS support, but that's easier to do and understand, I think.

rflechtner commented 16 hours ago

considering the discussion this links to, this probably requires a bit more beefed up step-by-step algorithm

This is certainly the end goal, but there's still quite a few different approaches we could choose from that would take us to different algorithms. The simplest approach, even if most verbose, is to just solve it on the level of the proof data model: Defining that a Data Integrity proof MUST have a @context property and that its value MUST be the string https://w3id.org/security/data-integrity/v2 or an array of strings whose first item is https://w3id.org/security/data-integrity/v2.

In this case there's no conditional logic and no fancy algorithms needed; simply by requiring a context that defines all relevant terms to always be present on the proof we've made sure that all its terms can be mapped. When I said that this could be made conditional, I was referring to this trade-off between verbosity and algorithmic complexity.

As with the choice between the three general approaches that I outlined, the decision which path to take depends a lot on preference, so before I draft any concrete algorithm I would want to see some agreement among the Editors and stakeholders of the specification concerning if and how this change is desired.