w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
112 stars 22 forks source link

`@protected` creates unresolvable conflicts when the same term is defined in two contexts top-level #443

Open trwnh opened 1 day ago

trwnh commented 1 day ago

I've just encountered issue #424 (and the related #361 as well) and in a similar situation with https://www.w3.org/ns/controller/v1 defining alsoKnownAs top-level alongside @protected: true, while https://www.w3.org/ns/activitystreams defines alsoKnownAs in a different namespace (as: vs sec:, loosely)

From controller/v1:

{
  "@context": {
    "@protected": true,
    "id": "@id",
    "type": "@type",

    "alsoKnownAs": {
      "@id": "https://w3id.org/security#alsoKnownAs",
      "@type": "@id",
      "@container": "@set"
    },
//...

From activitystreams:

{
  "@context": {
    "@vocab": "_:",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "as": "https://www.w3.org/ns/activitystreams#",
// ...
"alsoKnownAs": {
      "@id": "as:alsoKnownAs",
      "@type": "@id"
    }
// ...

Putting activitystreams before controller/v1 causes the later definition to override the older one, as expected (but not as desired):

{
  "@context": ["https://www.w3.org/ns/activitystreams", "https://www.w3.org/ns/controller/v1"],
  "type": "Person",
  "id": "http://person.example",
  "alsoKnownAs": "https://person.example"  // sec:alsoKnownAs
}
[
  {
    "https://w3id.org/security#alsoKnownAs": [  // should be https://www.w3.org/ns/activitystreams#alsoKnownAs
      {
        "@id": "https://person.example"
      }
    ],
    "@id": "http://person.example",
    "@type": [
      "https://www.w3.org/ns/activitystreams#Person"
    ]
  }
]

But putting activitystreams after controller/v1 triggers the error due to @protected: true:

{
  "@context": [
"https://www.w3.org/ns/controller/v1",  // uses @protected
"https://www.w3.org/ns/activitystreams"  // will trigger the redefinition error
],
  "type": "Person",
  "id": "http://person.example",
  "alsoKnownAs": "https://person.example"
}
jsonld.SyntaxError: Invalid JSON-LD syntax; tried to redefine a protected term.

JSON-LD 1.1 4.1.11 Protected term definitions https://www.w3.org/TR/json-ld11/#protected-term-definitions describes two exceptions. The first exception is when the definition is the same, which is not applicable here. The second exception is for property-scoped context definitions, which is unworkable because in this case the singular top-level object is intended to be both an Actor as well as a Controller Document.

To veryify, here's a type-scoped context definition that errors out:

{
  "@context": [
    "https://www.w3.org/ns/controller/v1",
     {
       "Person": {
         "@id": "https://www.w3.org/ns/activitystreams#Person",
         "@context": {
           "alsoKnownAs": {  // triggers the redefinition error
             "@id": "https://www.w3.org/ns/activitystreams#alsoKnownAs"
           }
         }
       }
     }],
  "type": "Person",
  "id": "http://person.example",
  "alsoKnownAs": "https://person.example"
}

And to reiterate, a property-scoped context definition can't be used because the alsoKnownAs property is top-level. So the way I see it, there's nothing that can be done to resolve this in a "plain JSON" compatible way except:

This leads me to think that @protected is a generally poorly-thought-out mechanism that highly increases the likelihood of such conflicts. Without it, as a producer I could just redefine the term later, for example by putting the activitystreams context last, or by using a local context object that comes after both remote contexts:

{
  "@context": [
  "https://www.w3.org/ns/controller/v1",  // needs to remove @protected
  "https://www.w3.org/ns/activitystreams"  // as:alsoKnownAs will override controller/v1's sec:alsoKnownAs
],
  "type": "Person",
  "id": "http://person.example",
  "alsoKnownAs": "https://person.example"  // as:alsoKnownAs
}

or

{
  "@context": [
"https://www.w3.org/ns/activitystreams",  // defines as:alsoKnownAs
"https://www.w3.org/ns/controller/v1",  // redefines sec:alsoKnownAs as @protected 
{
"alsoKnownAs": {
  "@id": "https://www.w3.org/ns/activitystreams#alsoKnownAs",  // won't work unless controller/v1 removes @protected
  "@type": "@id"
}
}],
  "type": "Person",
  "id": "http://person.example",
  "alsoKnownAs": "https://person.example"  // as:alsoKnownAs
}

I'm not sure the existence of @protected accomplishes its stated goal of "prevent[ing] this divergence of interpretation", nor that the rationale "that "plain JSON" implementations, relying on a given specification, will only traverse properties defined by that specification" is sufficiently addressing the issue of conflicts (or that it is a valid assumption in the first place). The issue arises when two specifications define the same term, and both specifications apply to the current object or document. It effectively leads to a hard incompatibility where it is impossible to implement both specs fully; you have to pick between them.

If there's an option I'm not aware of I'd like to hear it.

dlongley commented 1 day ago

There's a typo in the controller document v1 context and it should instead use the activitystreams vocab for alsoKnownAs. A bug fix will address this particular case.

That being said, the whole point of protection is to enforce a particular term definition in a particular place when a particular context is present. So it is not a bug that it is doing this, but a feature. And it does require coordination to share terms across contexts in the same place (by ensuring the term definitions match). That's a requirement for the feature to work. You can only use other term definitions when you bring in a property-scoped context (as mentioned), because decentralized extensibility (in this case, reuse of the same term with a different definition) is only considered safe in different areas of the JSON tree in the same document.

Of course, if specs and / or implementations allow for JSON-LD compaction to be performed, then significantly more flexibility is possible. All of these designs are around finding a balance for different kinds of consumers in a sufficiently large decentralized ecosystem, some who will only accept static documents and others who might use compaction prior to consumption. This of course creates constraints.

trwnh commented 1 day ago

the whole point of protection is to enforce a particular term definition in a particular place when a particular context is present. So it is not a bug that it is doing this, but a feature. And it does require coordination to share terms across contexts in the same place (by ensuring the term definitions match). That's a requirement for the feature to work.

If I'm reading this correctly, are you saying that two context authors are required to coordinate whenever there is a term conflict? This seems unworkable given the open-world assumption. If any single context author decides to make their term definition(s) @protected, then this creates problems for anyone else who defines the term differently. Essentially, one author doing it means that this author gets supremacy over the "plain JSON" and that their context declaration needs to come last or else the JSON-LD parser will throw a redefinition error. Two authors doing it will create an unresolvable error.

It seems to me like this unnecessarily makes things way more complicated for polyglots or anyone wanting to implement multiple overlapping specs. If for example schema.org decided to protect their context, it would become impossible to use both activitystreams and schema.org, because numerous top-level properties like name are shared across both contexts. A developer producing documents with "@context": ["https://schema.org", "https://www.w3.org/ns/activitystreams"] in this example would be creating irreconcilably unprocessable JSON-LD documents, because as:name is seen as a redefinition of schema:name. This means that either the developer will be forced to write their own context document (even if they don't understand JSON-LD), or that some downstream consumer will have to postprocess the unprocessable JSON-LD to replace the context with their own corrected one.

I don't see a situation that can possibly work smoothly so long as anyone uses @protected. If the aim is to ensure that terms don't get redefined, then this feels like a backfire because the actual result is that the entire document becomes unprocessable; instead of not understanding some number of redefined terms and having them appear to be missing ("I can't find schema:name, I only have as:name, but all the other schema: properties are as expected"), you end up not understanding the entire document ("my parser is giving me an error, I can't do anything with this unless I replace their context with what I am guessing they meant").

dlongley commented 9 hours ago

@trwnh,

Apologies, I would have written a shorter response if I had more time.

If I'm reading this correctly, are you saying that two context authors are required to coordinate whenever there is a term conflict?

No, I'm saying that the @protected feature was created for use by specifications that do require significant coordination to decide what the immutable definitions for certain terms in certain documents ought to be. This coordination may be done over the period of several years, in a standards working group. The @protected feature is to explicitly prohibit different definitions for the same terms in the same places in JSON documents. There is no way for two (or more) different context authors to coordinate to sort out a term conflict here, because using a definition different from what is written in the spec is prohibited. The coordination must happen prior to the spec becoming a standard.

This prohibition exists for a good reason: to enable both rigid and flexible implementations to interoperate.

It is used when there is a spec that expresses, in detail, a data model and JSON format, such that implementers who read the spec can write rigid implementations "in the context of" the data as expressed in the specification. In other words, from this perspective, these specs are no different from any other specification designed around information expressed in JSON (with no capability to transform conforming documents into some other expression).

These rigid implementations treat the URLs in the @context field as simple document type + version identifiers. No JSON-LD library or API calls are needed to work with conforming documents, as conformance requires that these fields match specific values and that the documents have an expected structure.

However, behind these @context values are actual JSON-LD context documents that are processable by more flexible implementations. These flexible implementations are able to use JSON-LD libraries to understand the data (potentially even without the spec, through "follow your nose") or to transform the data into a different expression that their code is expecting. By using the @protected keyword in these contexts, an enforcement process is introduced by which the same interpretation is guaranteed to be used across these different implementation approaches (or a protected term error will be thrown).

Of course, enabling these two approaches at once has trade offs. Nothing is for free. Coordination is required while creating the spec and, as is always required when using a JSON spec, a conforming document must not deviate from what's in the spec or reuse terms (JSON keys) to mean something other than what is in the spec. Simply put: the use of a spec and the @protected feature to increase interoperability across implementations of differing complexity reduces some decentralized extensibility in exchange for allowing less complex (but interoperable) consumers.

This seems unworkable given the open-world assumption.

It's workable, and only slightly more constrained, i.e., you can't "just use whatever term definitions you want" in your documents and expect them to be consumable by simpler implementations that are unable to understand your changes. The most basic and commonly reused term definitions from a spec are immutable.

If it helps, this can be thought of as extending the set of JSON keys that JSON-LD already doesn't allow redefinition of, i.e., all keywords (e.g., @context, @id, @type). I don't think this constraint makes JSON-LD "unworkable given the open-world assumption", as you say. By using the @protected feature, a context author just reduces the set of immutable JSON keys a little further beyond what JSON-LD already restricts in its own spec.

Specs that use this feature require the more complex implementations to express their documents in a more rigid way (really, in a specific context) in order to enable simpler implementations to exist. However, you can, of course, express all the information you want using other terms that the spec doesn't mark as @protected. The more complex implementations can then transform incoming documents into whatever contexts they want to (using whatever terms they want to) for consumption.

It is true that when a spec uses this feature it might become incompatible with another spec that also tries to enable these two types of implementations: you can't have a single document be expressed using two contexts that are in conflict with one another. Note that the Activity Streams work tried to enable simpler consumers too, it just didn't use the @protected feature (IIRC, it wasn't available at the time). A consequence of this is that anyone can change the definition of a term defined by the Activity Streams context (by using the @context field), but the simpler implementations do not detect it. This creates semantic confusion which can lead to a variety of serious problems. Newer specifications can avoid this by using @protected in their contexts to actually surface these errors -- so that no valid implementation can use such a document (as you say, the document becomes "unprocessable").

This means that either the developer will be forced to write their own context document (even if they don't understand JSON-LD), or that some downstream consumer will have to postprocess the unprocessable JSON-LD to replace the context with their own corrected one.

...

If the aim is to ensure that terms don't get redefined, then this feels like a backfire because the actual result is that the entire document becomes unprocessable; instead of not understanding some number of redefined terms and having them appear to be missing ("I can't find schema:name, I only have as:name, but all the other schema: properties are as expected")

Your concerns are certainly heard -- but it's important to remember that one of the constraints is that the simplest implementations do not use a JSON-LD library at all. To enable these implementations, document authors have to work within the constraints in the specification: you can't change certain term definitions in certain places. Simply allowing any definition to be used without throwing any errors won't solve this problem, it will just create semantic confusion. As always, myself (and many others) are all ears for a better solution to this problem (and given the constraints), but allowing semantic confusion to happen isn't an acceptable outcome -- so this is the best solution we've landed on for now.

msporny commented 8 hours ago

@trwnh wrote:

b) convince whoever is responsible for controller/v1 to redefine alsoKnownAs with the activitystreams-namespaced @id instead of the security-namespaced one

Hi, that's me ("whoever is responsible for controller/v1") :)

It's a bug, thanks for catching it; that context is fairly new and hasn't been put through its paces yet.

Feel free to raise a PR on controller/v1 to fix the issue, or I will do it when I get around to addressing the issue you raised in that repository.

trwnh commented 3 hours ago

Your concerns are certainly heard -- but it's important to remember that one of the constraints is that the simplest implementations do not use a JSON-LD library at all. To enable these implementations, document authors have to work within the constraints in the specification: you can't change certain term definitions in certain places. Simply allowing any definition to be used without throwing any errors won't solve this problem, it will just create semantic confusion.

This is part of my concern, though: a document producer who does not use JSON-LD, but declares two well-known remote context documents, because the specs tell them to, or because they think that's what they need to do.

What this producer has just done is completely invisible to "plain JSON" consumers (who aren't aware of any term definitions let alone the possibility of redefining one or that this might conflict). But even the most basic of JSON-LD processors now has to deal with the mess that was created by this incompatible context declaration. I'm not entirely convinced of the fail-fast-and-hard approach here; maybe the JSON-LD processing algorithm could use an additional flag that converts these errors to warnings? This would allow the processor to at least have something to process, provided that they are willing to accept the semantic confusion. (Any errors in schema would be caught "further down the chain", so the document may be discarded later if it is unsuitable for further processing.)

Essentially, the use of @protected in any context document needs to come with a heavy disclaimer that this heavily limits compatibility. "Be careful, this can prevent adaptation" feels like it's not making the consequences fully clear. There should probably be language added around using multiple context documents, and how the use of @protected in any one of them can create problems depending on the order you declare those contexts or on whether any of the others likewise declare @protected. It should be clearly called out that "warning,the JSON-LD document may become unprocessable" is even a possibility, so that context publishers can carefully consider this possible consequence before just slapping a @protected in there.