Reconsider `@context` injection feature based on implementer feedback

After performing an implementation, and working through a PR, @rflechtner has proposed that we remove @context injection feature from the Data Integrity specification in this comment:

https://github.com/w3c/vc-data-integrity/issues/231#issuecomment-2209299341

The feature was added based on feedback that "implementers should be able to inject a context". The current text doesn't quite do that; it's a bit of a half-measure, and after some discussion, there does not seem to be support to continue to support the injection feature in its current form (or at all).

This issue is meant to track that concern and decide what to do. Among the options to consider are:

Normatively, continue to support context injection, and provide much more clear rules on how to support it across all cryptosuites.
Normatively, do not support context injection at all.
Normatively, do not prohibit context injection, but informatively note that its up to applications to do it after they have carefully considered the ramifications.

Can we get clarification on what "context injection" is?

Can we get clarification on what "context injection" is?

Context injection is described in https://www.w3.org/TR/vc-data-integrity/#context-injection and refers to amending the @context property of the secured document with a context value that maps the terms contained in a proof. In practice, this means that the document is altered in the process of proof creation - or even of proof verification, the specification is not entirely clear here. My assumption is that this practice was initially introduced under the assumption that the @context property is not secured by the proof (see https://github.com/w3c/vc-data-integrity/issues/272 where this has been discussed) and thus can be manipulated without negative consequences, an assumption that however does not hold true when using cryptosuites that treat the document as regular JSON, e.g. those using the JCS transformation (such as eddsa-jcs-2022).

We've tried addressing the issue by means of a workaround on JCS cryptosuites (https://github.com/w3c/vc-data-integrity/issues/225), but as I outlined in https://github.com/w3c/vc-data-integrity/issues/231 there are more accommodations that would need to be made to make sure that JCS-based proofs can reliably be verified. These accommodations, if they are even possible, need to be very carefully designed to make sure that not only reliably verification is guaranteed, but also an effective protection of the semantics expressed in @context values originally added by the issuer.

These problems uncover that the praxis of context injection leads to an unnecessarily tight coupling to JSON-LD. As I tried to describe in https://github.com/w3c/vc-data-integrity/issues/231#issuecomment-2211027752, abandoning context injection (the practical value of which is unclear) means that Linked Data becomes an additional semantic layer that can be added to a document, but is not strictly required when using cryptosuites such as eddsa-jcs-2022.

I'll try to briefly summarise the arguments for and against abandoning context injection as a feature of this specification as far as I see them.

Reasons to abandon context injection:

Context injection is the only feature of the DI specs as currently proposed that violates the reasonable assumption that the document to be secured is not altered through the process of issuing a proof for it; or, in other words, that information relating to cryptographic authentication should be distinguishable and expressed separately from the data it is meant to secure.
Violating the above assumption is orthogonal to the concept of proof sets or proof chains, where multiple proofs secure the same document, often applied consecutively. If adding a proof can change the document, then later proofs can effectively secure a different document than the ones added earlier.
The manipulation of the document's @context property as part of adding a proof always carries the possibility of edge cases where the semantics of the document are changed simply by adding a proof, especially where JCS based cryptosuites are used.
Abandoning context injection allows equal support for JCS- and LD/RDF-based cryptosuites and removes the need for a number of semi-functional workarounds we've had to apply to JCS-based cryptosuite specifications.
This also means that the understanding and use of Linked Data semantics remains an option, but is no longer a hard requirement when using JCS-based cryptosuites. Not only does that extend the range of use cases for this specification, it will likely also make it much easier to win over Linked Data sceptics - not only to support and use this specification, but potentially even to transition to Linked Data semantics and features eventually, as there is a simple and straightforward 'upgrade path' available.

Reasons against abandoning context injection:

Interoperability / Legacy Support: Given that it has been part of this specification for the longest time, and that similar proof methods have preceded it, context injection may performed as part of existing implementations. Designing the DI specifications on the assumption that the document is not altered by adding proofs would mean that JCS-based proofs may be invalidated when legacy proof methods are added to a document later on - at least if we would also revert the workaround developed in https://github.com/w3c/vc-data-integrity/issues/225
Context injection also serves the purpose of guaranteeing that the terms used in DI proofs can be reliably mapped to Linked Data and cannot be redefined. New measures would have to be introduced to enforce this; but solutions are readily available, for example in form of adding a @context property to the proof object itself that provides the relevant mappings.

@rflechtner, some rough thoughts on potential normative language that we could use/add -- I haven't thought through each one in detail (some of them might be unworkable):

Injection MUST happen at the application layer, that's the only place it makes sense to do it.
The application layer MAY inject stuff BEFORE it adds ANY proof.
The application layer MUST/SHOULD NOT inject stuff before it it verifies a proof.
If the application layer injects a context, it MUST ensure that at least the data-integrity/v2 terms are a part of that injection (and maybe this doesn't apply to JCS)?

some rough thoughts on potential normative language that we could use/add

1-3 make perfect sense to me.

If the application layer injects a context, it MUST ensure that at least the data-integrity/v2 terms are a part of that injection

This moves into the territory of what I referred to as 'alternative measures to enforce that proof terms can be mapped to Linked Data', and thus depends on the preferred solution. What you sketched out is one possible solution that does seem workable, but requires that the application logic that puts together the document to be secured considers the semantics of the proof that's added by a DI implementation.

I currently see roughly 3 possible types of solutions of which we may pick one:

Require application logic to ensure correct mapping of proof terms by including appropriate contexts in the document @context property before any proofs are added; this may be mandated always, or could be required only if the application wants to use proofs that require RDF canonicalization.
Let DI implementations enforce semantics of proof terms explicitly by including a @context property on the proof object that includes the value https://w3id.org/security/data-integrity/v2. This could be made conditional of course, such that this @context value is only added if the document doesn't already map all the terms correctly.
Make enforcing correct proof property mappings an implementation detail of RDFC-based proofs. The proof configuration algorithm of eddsa-rdfc-2022 and similar suites calls for "Setting proofConfig.@context to unsecuredDocument.@context" which could be amended to inject the value https://w3id.org/security/data-integrity/v2. This way of enforcing semantics is more implicit and does not strictly guarantee that the entire secured document (including proofs) can be properly understood by a JSON-LD processor.

Personally, I'm currently leaning towards preferring the second solution where the context mapping proof-related term is added to the proof object instead, simply for the separation of concerns that this affords: let the application logic worry about the semantics of the document (if it wants to deal with JSON-LD at all) and the DI implementation worry about enforcing semantics for its terms, without the two interacting. So far this seems workable to me, but it's also not 100% thought out.

In normative language, this could read as follows: During proof creation, implementation MUST ensure the correct mapping of all proof terms by injecting the context value https://w3id.org/security/data-integrity/v2 to the @context property of the proof object. This injection MAY be omitted if the @context value of the document already provides an equivalent mapping, e.g., when it uses the Verifiable Credential Data Model v2.0 context (https://www.w3.org/ns/credentials/v2).

Let DI implementations enforce semantics of proof terms explicitly by including a @context property on the proof object that includes the value https://w3id.org/security/data-integrity/v2. This could be made conditional of course, such that this @context value is only added if the document doesn't already map all the terms correctly.

From an implementation perspective, considering the discussion this links to, this probably requires a bit more beefed up step-by-step algorithm with e.g. an example of a starting document showing trasformations and what pieces of it are signed and so on. I tell this also because I indeed implemented one version with .NET and it took some time to figure out that one version that seemed to work, even if I had "matters of confusion" (such as this injection discussed here). Some of the implementation issues I think I have in .NET were just recently like https://github.com/dotnetrdf/dotnetrdf/issues/615 (the official version having this was released just some weeks ago, see also https://github.com/dotnetrdf/dotnetrdf/issues/631) and so the simpler the better and easier to review and verify the individual given library used steps the better. .NET also does not have native JCS support, but that's easier to do and understand, I think.

considering the discussion this links to, this probably requires a bit more beefed up step-by-step algorithm

This is certainly the end goal, but there's still quite a few different approaches we could choose from that would take us to different algorithms. The simplest approach, even if most verbose, is to just solve it on the level of the proof data model: Defining that a Data Integrity proof MUST have a @context property and that its value MUST be the string https://w3id.org/security/data-integrity/v2 or an array of strings whose first item is https://w3id.org/security/data-integrity/v2.

In this case there's no conditional logic and no fancy algorithms needed; simply by requiring a context that defines all relevant terms to always be present on the proof we've made sure that all its terms can be mapped. When I said that this could be made conditional, I was referring to this trade-off between verbosity and algorithmic complexity.

As with the choice between the three general approaches that I outlined, the decision which path to take depends a lot on preference, so before I draft any concrete algorithm I would want to see some agreement among the Editors and stakeholders of the specification concerning if and how this change is desired.

w3c / vc-data-integrity

Reconsider `@context` injection feature based on implementer feedback #281