Content addressable contexts

gkellogg commented 6 years ago

In the Verifiable Claims (and other) groups, the concern about things such as JSON-LD contexts changing meaning over time came up. On thought would be to create a convention, or rely upon some other standard, to allow URIs to be "content addressable", in the sense that it can be determined that the content derived from dereferencing a remote resource can be determined to be the same content as was originally intended.

There's also a desire to be able to pre-load contexts, without requiring that they be downloaded when accessed, which could be addressed using such a mechanism.

There are also some proposed RFCs that attempt to address this problem (thanks to @mesinter):

'duri' URI takes the form:
```
duri:<timestamp>:<embeddedURI>
```
'tdb' URI takes a similar form:
```
tdb:<timestamp>:<embeddedURI>
```

See https://github.com/w3c-ccg/did-spec/issues/32

davidlehn commented 6 years ago

There's also the tag URI scheme: https://tools.ietf.org/html/rfc4151

You don't really need special URIs. Can just as easily publish a context with a http URI with a policy that it won't ever change. And make that more explicit with proper caching headers.

Some of our initial w3id.org contexts are versioned like https://w3id.org/foobar/v1. Idea was to have a policy to not break old usages of that id. In practice we've tried to just add to those contexts. Gets to be a challenge when you do need to change things. This isn't a fun issue to deal with.

Is this something to address outside of the core specs? Maybe in a best practices doc? Seems like we'd just want to make sure core specs don't mandate any particular scheme or force dereferencing, and leave it up to implementations to have fancy document loaders that know what to do.

gkellogg commented 6 years ago

Also brought up by @sandhawke was the use of [Sub-Resource Integrity]()https://www.w3.org/TR/SRI/, mostly as used in HTML. In HTML, this looks like the following:

<script src="https://example.com/example-framework.js"
        integrity="sha384-Li9vy3DqF8tnTXuiaAJuML3ky+er10rcgNR/VqsVpcw+ThHmYcwiB1pbOxEbzJr7"
        crossorigin="anonymous"></script>

For use in JSON-LD, we would likely need to accept another object format as a value of @context. For example:

{
   "@context": {
      "@id": "http://example.org/context-v0.jsonld",
      "@integrity": "sha384-Li9vy3DqF8tnTXuiaAJuML3ky+er10rcgNR/VqsVpcw+ThHmYcwiB1pbOxEbzJr7"
  }
}

This could be detected because of the use of the @integrity keyword. A client could verify that a previously downloaded, or separately provided JSON-LD file associated with the specified URI can be used as a context (or, potentially any JSON-LD file), if the integrity matches that specified via @integrity. Logic could be described in the documentLoader to maintain a cache of such retrieved documents. The API can also provide a means of pre-populating this cache, and perhaps allow for pre-distribution of documents not using sub-resource integrity.

msporny commented 6 years ago

Two other options:

"@context": "https://ipfs.io/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsNTn2aDxdXBFca7D"
"@context": "ni:///sha-256;UyaQV-Ev4rdLoHyJJWCi11OHfrYv9E1aGQAlMO2X_-Q"

I think I'd prefer that we use RFC6920 https://tools.ietf.org/html/rfc6920

It has a nice bridge to the Web via things like (see domain name):

"@context": "ni://example.com/sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx-j1ncoSt3SABJtkGk"

gkellogg commented 6 years ago

My concern with "ipfs" is that there's nothing inherent in the protocol that would allow an implementation to trust a bundled context file; an implementation shouldn't need to know anything about special rules for a specific host.

the "ni" scheme handles this, but doesn't have a provision for retrieving a context that wasn't bundled.

Sub-Resource Integrity seems like it address both bundle/side-load and retrieve use cases, at the expense of ugliness and new syntax.

Another possible solution is to simply point out the hooks that a custom documentLoader provides and leave it at that. The use of the other schemes could be an application note.

cwebber commented 6 years ago

What about magnet URIs? You could specify both the "official web location" and the hash in the same magnet URI:

magnet:?as=https://mycontext.example/&xt=urn:sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c

gkellogg commented 6 years ago

What about magnet URIs?

Cool, I was unaware of these. Of course, it's a "de facto" standard, so not really referencable.

My thought is to create a section on best practices for managing remote contexts which can describe these various options, and point out that the use of a custom documentLoader is encouraged to provide client-specific support.

We should discuss on an upcoming telecon.

BigBlueHat commented 6 years ago

👍 to discussing this more. The Web Publishing WG has not dissimilar needs for content integrity (see https://github.com/w3c/wpub/issues/125).

cwebber commented 6 years ago

As a fun source of inspiration (thanks to @csarven for pointing it out) https://hash-archive.org/history/https://dustycloud.org is neat

gkellogg commented 6 years ago

Deferred to WG due to https://json-ld.org/minutes/2018-04-10/#resolution-3.

gkellogg commented 6 years ago

Closed in favor of https://github.com/w3c/json-ld-syntax/issues/9.

json-ld / json-ld.org

Content addressable contexts #547