w3c / json-ld-api

JSON-LD 1.1 Processing Algorithms and API Specification
https://w3c.github.io/json-ld-api/
Other
76 stars 29 forks source link

Term protection algorithm steps do not adequately protect `@vocab` #553

Open dlongley opened 1 year ago

dlongley commented 1 year ago

EDIT (2024) to provide some context (no pun intended): In the Verifiable Credentials work, an argument to add @vocab to the core VC context was made (notably, I recommended against it) which caused this question to come up. Some arguing for it were happy to be able to change the @vocab value in subsequent contexts as current processors allow and the group moved on. Recently, others have been surprised not necessarily that this can happen, but even that terms that would have been caught by a "catch all" @vocab can be defined in subsequent contexts. See my comment below on how, when followed to its logical conclusion, it seems that the current behavior is the only possible way for a "catch all" use of the @vocab feature to work.


When @vocab is used in an @protected context, the terms it auto-defines should do not receive the same protection that any other terms receive. Any processor that only checks an @context value (i.e., a URL w/an immutable context value) should cannot be guaranteed to have the same view on the term mappings generated as a processor that, for example, transforms the data to another @context. This is the value and purpose of @protected.

However, The steps currently expressed in the API spec do not cause @vocab (and @base as well, because these are specially-called out keywords for processing) to run through the protection processing code. IMO, this should be considered errata. Any keyword values in an @protected context that establish term definitions should receive the same protection that the terms do -- as changing the values of those keywords would change the term definitions, breaking the expected protection and unified processing model. EDIT: IMO, the combination of @vocab and @base with @protected do not make sense and would produce results that are even more unexpected than a naive guess at what would happen if they did work together.

gkellogg commented 1 year ago

The syntax describes @protected as working on term definitions:

@protected Used to prevent term definitions of a context to be overridden by other contexts. This keyword is described in § 4.1.11 Protected Term Definitions.

IIRC, there was a discussion on protected contexts vs protected term definitions, and term definitions were important, as more contexts could be layered on that weren't protected, but the term definitions that were protected would remain.

Neither @vocab, @base, @language, @direction, nor @propogate are considered to be term definitions.

For @vocab, the implications to the Context Processing Algorithm would be in step 5.8 and would need to have "vocabulary mapping" be a structure including some notion of protected, along with the map used, and step 5.8.2 and 5.8.3 would to consider if @vocab was protected, using similar logic to that used for individual term definitions. This would need to be replicated for many/most of the other keywords directly associated with the context. I'm sure there are other cases related to, say, property-scoped contexts.

Also, there would be quite a bit of new description need in the syntax document, as most of the discussion is on protected term definitions.

dlongley commented 1 year ago

For @vocab, the implications to the Context Processing Algorithm would be in step 5.8 and would need to have "vocabulary mapping" be a structure including some notion of protected, along with the map used, and step 5.8.2 and 5.8.3 would to consider if @vocab was protected, using similar logic to that used for individual term definitions.

Yes, I expected that when @vocab (and other special-cased keywords) were added to the active context, they would have been marked protected (or not) just like term definitions, so that if encountered again in a subsequent context, the protection rules could be applied to either throw an error or allow the new definition.

EDIT: But new definitions would never work with a "catch all" @vocab, which is the common use case, so when actually thinking through how an implementation would work, I have decided that this is a bad idea.

dlongley commented 3 months ago

We may have decided that using @protected with @vocab was a non-starter years back and failed to document it. Notably users of @vocab and users of @protected tend to run in different circles, so using them together is usually unexpected. Part of this is because @protected is for being very specific and rigid, whereas @vocab is usually a "catch all" -- and recommended against for documents that need securing like Verifiable Credentials. I bumped into this again while thinking about how these features would play together while evaluating a misuse of JSON-LD contexts in the Verifiable Credentials work. When considering it, I think that protecting @vocab doesn't make sense.

The @protected feature is primarily useful when you want to read a property that you are expecting to be defined according to a context value, that other subsequent context values then cannot change in unexpected ways.

This allows you to ignore unknown contexts that follow the context you do know, and still allow you to consume properties from the known contexts (but never the unknown ones). However, if you had a known context that used @vocab -- and you wanted this to work with @protected -- then what would have to happen would be the prohibition of defining any additional terms where @vocab was active in subsequent contexts. And for @vocab users, the point is to make @vocab active everywhere as a catch all. This means using it as a protected catch all and also allowing subsequent contexts to define terms is logically impossible.

To give an example:

  1. Suppose there is a context A with an @vocab URL of https://example.com#. The author of this context uses a new @protected feature of @vocab to ensure that consumers can know, for example, any term x will always map to https://example.com#x but to also allow subsequent contexts to be mixed in that define other terms.
  2. Suppose an author produces a document with contexts [A, B]. Where B defines foo to map to https://other.com#foo.
  3. Suppose a consumer only understands context A and attempts to rely on this new @protected feature of @vocab to ensure that they can assume that foo always maps to https://example.com#foo including when other (to-be-ignored) contexts are present. Well, this wouldn't work. If a processor allowed context B to map foo to https://other.com#foo, as in the previous step, then the consumer would see the wrong value. Therefore, the processor would have to prohibit the definition of foo in the previous step. But not only foo -- every other term where @vocab was active would also have to be prohibited in the same way. Here this means all terms.

So, if you want to use @vocab in this way, you do not need an @protected + @vocab feature, because any contexts that appear after context A cannot define any terms and can just be removed (or documents that include them can just be rejected). In fact, it would be an anti-pattern and an error to ever combine any other meaningful context with one such as A.

I recommend we close this issue because using @protected with @vocab doesn't seem to make very much sense. The only thing that might be of value is expressing this rationale and example in the spec somewhere to users that are attempting to do something that is not logically possible.

gkellogg commented 3 months ago

@dlongley I put it in the "Discuss Call" list, but as you're the creator, you can feel free to close the issue. But, I presume there is some change needed to at least the API document to at least note this, if not add more normative text.

BigBlueHat commented 3 months ago

The only thing that might be of value is expressing this rationale and example in the spec somewhere to users that are attempting to do something that is not logically possible.

There is at least this call to action. Some sort of note/warning on each of these two terms cautioning about the dangers of assuming they can be used together without confusion. We can certainly add a section in any future best practices content as well, but some sort of syntax spec level warning message seems justifiable.