Introduce concept of "sealed" contexts

dlongley commented 6 years ago

A number of standards track specifications use JSON-LD for extensibility but intentionally place limitations on overriding the terms defined in a core context. For example:

Implementations may augment the provided @context with additional @context definitions but must not override or change the normative context.

https://www.w3.org/TR/activitystreams-core/#jsonld

Implementations MUST produce an error when an extension JSON-LD Context overrides the expanded URL for a term specified in the base JSON-LD Context (https://w3id.org/credentials/v1). To avoid the possibility of accidentally overriding terms, developers are urged to scope their extensions.

https://w3c.github.io/vc-data-model/#extensibility

It seems like it would be a nice feature to allow enforcement of this desire by JSON-LD processors. Then authors could be assured that the interpretation of a context would be proper and properly implemented JSON-LD processors would throw errors if rules were violated.

In short, I propose we add a keyword with boolean value such as "@sealed": true (that could appear in contexts) to JSON-LD 1.1 that enables processors to enforce a desire to prevent defined terms from being redefined in subsequent contexts, but allows for new terms (aka extensions) to be defined.

dlongley commented 6 years ago

Whether or not this keyword should appear in specific term definitions (more verbose but allows for granularity) or at the top of a context (simpler and fits the main use cases for now) is open to debate.

msporny commented 6 years ago

I fear that this would lead to just about every JSON-LD context author placing @sealed on every property which would prevent developers from overriding things when necessary (e.g. schema.org). So, I expect you to want to @seal everything or nothing, not some properties and not others.

I'm also concerned about the feature bloat that JSON-LD is undergoing. Just because we've seen this in a few places doesn't mean we need to standardize it and should try really hard to support the feature w/o putting it into the JSON-LD spec.

I get the upside of this feature... I'm just not sure it warrants the added complexity.

cwebber commented 6 years ago

Here are some concerns:

If you already aren't paying attention to the @context field at all, isn't there a chance someone could just never include the relevant context, or include a different context instead? (I suppose protocols with implied contexts maybe apply to that case)
What about #547 (content addressable contexts / the dangers of mutable contexts)? Does this make that even more complex?

dlongley commented 6 years ago

@cwebber,

If you already aren't paying attention to the @context field at all...

With this feature, this becomes a simple thing to check for in naive JSON implementations. Require a check on the @context property where its value is a string that matches the core context or it is the first entry in an array value. It increases interop without requiring JSON implementations to do JSON-LD processing to properly check things -- or without strongly suggesting they run a processor to avoid interop problems ... when obviously they won't. It reduces the barrier to proper interop and increases its likelihood.

Regarding #547 -- that's an open question.

dlongley commented 6 years ago

With this keyword, it will be clear how systems that don't check @context will have processed the data. They will have interpreted it according to the "core context". A system that then uses a JSON-LD processor (that understands @sealed) will interpret the data the same way. As it stands right now, the second system will interpret the data in a different way. Especially if that system is some third party that knows nothing about the standard and the data was simply passed on to it.

I think we can improve standards that tell implementers that if they are paying attention to X then they MUST not allow Y to happen ...but that they can also just not pay attention to X if it's too burdensome. Rather, if we want there to be standards that allow implementers to not pay attention to X, we should enable their interpretations of the data to be the same as the interpretation of those who are paying attention to X.

dlongley commented 6 years ago

I want to note that simply requiring the core context to appear last in the @context array would not address scoped context issues. I believe it could have been a solution, albeit kind of an aesthetically ugly one, prior to scoped contexts, but given their introduction, it no longer works AFAICT. That we've taken away this power in JSON-LD 1.1 is another reason to support this feature.

dlongley commented 6 years ago

Another option is to mark contexts as sealed in the data itself.

{
  "@context": [
    {"@sealed": "https://w3id.org/some-standard/v1"},
    "https://some-extension.com"
  ],
  "foo": "bar"
}

There may be other ways to make this look better and give more guidance:

{
  "@context": {
    "@core": "https://w3id.org/some-standard/v1",
    "@extension": ["https://some-extension.com/some-context/v1"]
  ],
  "foo": "bar"
}

This approach would avoid concerns people have about popular contexts marking their terms as sealed. Then simple JSON implementations could check for this format without having to do processing.

gkellogg commented 6 years ago

A lot of room for bike-shedding here.

I tend to agree with @msporny about feature bloat, but as with ever other thing that's come in, there's certainly reasonable motivation; it is an inevitable consequence of encouraging the use of the JSON syntax, rather than requiring re-compaction for reuse.

I could see using @sealed (or perhaps @frozen) at either the context level or at the term level.

Is it an error for a follow-on context, or scoped context to attempt to re-define a term? Or, just ignore.
What about @vocab? is that frozen too?
would using "@context": null still be allowed? Would it reset to working context at that point?

It seems to me to be reasonable for others to mix-in schema.org with other vocabularies, and thus contexts; for example, we have examples in the spec of mixing schema.org with FOAF, both of which use name.

It could be that values of properties in a given context maintain that restriction, but values of properties defined in another context might not, as a schema.org processor would ignore such values anyway.

iherman commented 6 years ago

I tend to look at JSON-LD as “simply” an RDF serialization, and I am wary of adding features that have no counterpart in RDF or in other RDF serializations.

In RDF the owner of an ontology cannot “freeze” its content. There are social safeguards against, say, ontology hijacking, which has proven to be enough, and I have not heard of any real-life cases for such features.

Also, as @gkellogg says, there are ample rooms for lots of bike shedding here. E.g., am I allowed to add a new term or class in a namespace “owned” by somebody else? In RDF, the answer is yes. What about a frozen context file? Am I allowed to add a new term to a namespace (say, a @vocab) defined in a frozen @context file? Can I add a "@vocab":"null" in my own context? (E.g., the schema.org context, though it defines every term of the core vocabulary explicitly, also includes a @vocab for schema.org; I may want to nullify that if I want to add extensions).

At this point, I think I am more concerned about feature bloat, as @msporny says, and I am therefore not convinced about adding this feature.

dlongley commented 6 years ago

@iherman,

I tend to look at JSON-LD as “simply” an RDF serialization, and I am wary of adding features that have no counterpart in RDF or in other RDF serializations.

In RDF the owner of an ontology cannot “freeze” its content. There are social safeguards against, say, ontology hijacking, which has proven to be enough, and I have not heard of any real-life cases for such features.

I think JSON-LD has some very important unique features and it would be a mistake to consider it "simply" another RDF serialization. It is a bridge technology (between RDF and Web devs) in a way no other syntax is. It also has a more powerful "context" feature that helps it be that bridge technology.

JSON is far and away the most popular syntax for Web devs and I think that any features we can put into JSON-LD to remove barriers to making it extensible, machine readable, and more widely reusable across the Web ecosystem are strongly worth investigating.

I also want to say that we're not talking about freezing ontologies here. We're talking about freezing contexts and that's a huge difference. And we may be able to do it in such a way that the freeze applies only to data payloads that are using a particular context (in a way that is minimally testable without a JSON-LD processor) rather than having to freeze the context always.

The point is to enhance interoperability with JSON only implementations and to reduce complaints around "developer ergonomics". Making it easier for consumers of JSON-LD to treat it as JSON (without ill side effects) is an important goal, IMO -- and perhaps the most important one for the syntax as it is what differentiates it.

dlongley commented 6 years ago

@gkellogg,

I tend to agree with @msporny about feature bloat, but as with ever other thing that's come in, there's certainly reasonable motivation; it is an inevitable consequence of encouraging the use of the JSON syntax, rather than requiring re-compaction for reuse.

Yes. And I believe we need to do a better job of not requiring re-compaction since we want to support this.

There are complaints that certain standards explicitly do not mandate JSON-LD re-compaction -- but that it is actually necessary for proper interop. Of course, if everyone follows the custom rule these standards put forth ("you MUST not redefine terms") then the data would only ever be interpreted one way. However, we currently have a situation where these standards simultaneously enable systems to avoid checking that these rules are followed and encourage the use of these technologies in order to promote the sharing and reuse of data across disparate systems. It seems, therefore, that we will inevitably end up with different interpretations of the same data, thus defeating the point.

Perhaps it could be argued that these will be corner cases and are not important, I don't know. But I think it would go a long way to have a mechanism by which we can prevent them from occurring and give stronger assurance that JSON-LD is a viable open world, machine-readable, unified extension mechanism for systems that consume it as merely JSON. JSON has no such mechanism today -- and this is a major reason why standards are attempting to use JSON-LD to fill that gap.

Is it an error for a follow-on context, or scoped context to attempt to re-define a term? Or, just ignore.

My initial view is that I think redefinitions should be ignored by default where processors have an option to throw an error. In other words, you can try to redefine a term, but nothing will happen and it will remain the same. My reasoning for this is that it will mirror how JSON only processors will interpret the data. They will not fail (because they will not attempt any JSON-LD processing) and they will interpret terms from the frozen context as if they were not redefined.

What about @vocab? is that frozen too?

I think it makes no sense to freeze @vocab. This is about defining specific terms that cannot be overwritten for interoperability purposes with JSON only implementations. JSON only implementations do not "see" vocab terms because they are "undefined". I think we should always see it in that light when trying to make these determinations.

would using "@context": null still be allowed? Would it reset to working context at that point?

No, it would not be allowed because it would allow JSON only implementations to interpret the data in a different way from those that use a JSON-LD processor to do recompaction. Again, this, to me, is the guiding principle behind how this feature should work. Hopefully it will help cut through any other bikeshedding. The purpose of the feature is to force users that do employ a JSON-LD processor to interpret the data the same way as those who do not. This is the chief complaint filed against standards that use JSON-LD for extensibility: the standards claim that JSON-LD processing is not needed but "it is because the JSON and JSON-LD systems will interpret it differently."

I'm open to other features/mechanisms that we could add to JSON-LD 1.1 to achieve the same goal of addressing this complaint.

gkellogg commented 6 years ago

Closed in favor of https://github.com/w3c/json-ld-syntax/issues/20.

json-ld / json-ld.org

Introduce concept of "sealed" contexts #656