Closed msporny closed 5 years ago
Rename "full Processor" to "HTML Processor".
I agree that the original naming may give the wrong impression (that other processors are somehow incomplete), and discourage some people from adopting JSON-LD. "HTML Processor" is a little misleading, but a better name could indeed be found.
Remove the ability to use text/html files as JSON-LD Contexts as pure JSON Processors are not capable of processing them, which will lead to a variety of issues related to developer ergonomics.
The argument was raised that JSON-LD Contexts are bona fide JSON-LD documents, and so it would be difficult to argue that a Full "Extended" processor could sometimes load JSON-LD from HTML, and sometimes not... I think this is a valid argument.
That being said, we could address your concern by replacing the Note, at the beginning of section 7, by a Warning, stating "not available in a Pure JSON-LD Processor" rather than "available in a Full Processor". And possibly hinting that content-negotiation is a more "portable" solution?...
@pchampin,
And possibly hinting that content-negotiation is a more "portable" solution?...
I don't think we should merely "possibly hint" at this; my preference would be to make it a requirement that you MUST make your @context
available as JSON. But, short of my own preferences, we should be very clear that you SHOULD do so and that if you don't, your @context
won't work with every JSON-LD processor, only those that add the extra HTML feature set. I think we should be strongly encouraging JSON over HTML, but allow HTML for documentation purposes.
And possibly hinting that content-negotiation is a more "portable" solution?...
I feel stronger about this than @dlongley does... don't open up the Pandora's box of reading JSON-LD Context's from HTML. Remove the feature. The only argument that I can see for it is that it's a "neat feature" in the academic completeness sense... but JSON-LD was never meant to be an academically complete mechanism... it was supposed to help developers publish JSON-LD, but not become so complex that it blows your foot off when you try to use it. Having this feature means that developers will inevitably publish their JSON-LD Context as HTML only, which will cause a split in the ecosystem between "We expect you to publish via HTML" and "We expect you to publish no via HTML".
"HTML Processor" is a little misleading, but a better name could indeed be found.
Isn't the only feature that the "full" processor has over the JSON-only one the fact that it parses stuff from HTML?
Isn't the only feature that the "full" processor has over the JSON-only one the fact that it parses stuff from HTML?
Yes, but "HTML Processor" makes it sound like it can only process HTML...
And possibly hinting that content-negotiation is a more "portable" solution?...
I feel stronger about this than @dlongley does... don't open up the Pandora's box of reading JSON-LD Context's from HTML. Remove the feature. The only argument that I can see for it is that it's a "neat feature" in the academic completeness sense... but JSON-LD was never meant to be an academically complete mechanism... it was supposed to help developers publish JSON-LD, but not become so complex that it blows your foot off when you try to use it. Having this feature means that developers will inevitably publish their JSON-LD Context as HTML only, which will cause a split in the ecosystem between "We expect you to publish via HTML" and "We expect you to publish no via HTML".
This was not added because it's a "neat feature", but as a response to concerns raised in #43. If JSON had a built-in commenting feature, it would be likely not necessary.
Because of this, and the need to normatively describe the in-the-wild JSON-LD in HTML scenarios provided a mechanism to do this. Once you describe JSON-LD in HTML, then allowing that for contexts and frames is a logical progression, particularly when the extraction is described in the document loader, which is the standard way to fetch all remote content.
The fact that it came up in w3c/vc-data-model#585 just goes to show a general need to be able to document contexts, and containing the context in the documenting HTML is likely a better way to keep them from diverging than using different resource formats.
I agree with @dlongley that we should better describe the potential for splitting the eco-system by recommending (SHOULD) that publishers provide an application/ld+json
version via content-negotiation and not depend on a processor's conformance with HTML processing.
With chair hat on...
I also note that expressing JSON-LD Contexts in HTML was not contemplated in any of the input documents to the JSON-LD WG and as such, the group is skirting very close to being in violation of their charter
Could you point out where in the charter it says that we can only introduce features described in input documents to the WG? Because that would also preclude features like @protected
, as far as I'm aware. I don't think that's, thus, relevant here unless you can find somewhere that says we're constrained in this way?
And with chair hat off ...
I agree with @gkellogg that if we say that a context is JSON-LD, and that JSON-LD can be expressed in a script element of an HTML page, then the implication is that a context can be expressed in a script element of an HTML page. If I recall correctly, @danbri has brought up his issue as a frustration of web developers.
The possible routes forward seem to be:
I agree with @pchampin that "extended" is better than "full", along with a big warning about contexts in HTML being complicated in the spec.
JSON-LD in HTML exists and even informatively--when viewed from the HTML-perspective: https://html.spec.whatwg.org/#the-script-element:attr-script-type-4
In the current spec-space, it's already possible to extract JSON-LD from HTML and use it as JSON(-LD)--because that's how data blocks work with any embedded format (CSV, YAML, etc.).
We have gone beyond simply echoing that fact in the syntax document and instead baked additional processing steps into the API.
Shifting things into the documentLoader space does help from an architectural layering concern, but this "context in HTML" usage raises a whole host of architectural and community concerns. It effectively moves us from the current world of extracting-then-using the embedded JSON-LD into one where HTML becomes a valid representation of JSON-LD itself.
We need to work to re-narrow our focus at this stage, and go back to the "simplest thing that could possibly work."
This issue was discussed in a meeting.
From @danbri, posted with permission, after discussion with @gkellogg:
- We are uncomfortable that our site (by virtue of our context url) has implicitly become a software component in a system where we don't even really know the other software components. I am considering turning off the context serving at weekend to encourage caching and more robust clients.
1b. Aside: it could be interesting to have a best practice note about how software components fetching contexts might identify themselves incl versions in http requests (user agent)
We are unhappy that the expectation of content negotiation on our home page blocks us from moving to 100% static-served site.
If we could have a small snippet of jsonld in our homepage, pointing off to a separate url with our giant big context file, that would be great
We are not interested in putting the whole context into our homepage; it is way too big. Similar issues may hold for Wikidata at some point.
We appreciate the reluctance to entangle the pure json nature of json-ld with html, but note that the success of json-ld was achieved in large part through just such an entangling
1b. Aside: it could be interesting to have a best practice note about how software components fetching contexts might identify themselves incl versions in http requests (user agent)
:+1: Related to this, that best practise note should also talk about caching of contexts.
We are unhappy that the expectation of content negotiation on our home page blocks us from moving to 100% static-served site.
One possible solution for this would be to allow a link header to be added to HTML documents that points towards contexts. (This may not solve all static site use cases though, as platforms like GitHub pages don't support custom link headers AFAIK)
We are uncomfortable that our site (by virtue of our context url) has implicitly become a software component in a system where we don't even really know the other software components. I am considering turning off the context serving at weekend to encourage caching and more robust clients.
I think that this would be a good thing to do. Provide guidance on aggressively caching the schema.org context (or packaging it with software implementations).
We are unhappy that the expectation of content negotiation on our home page blocks us from moving to 100% static-served site.
Then state that the new schema.org context will be served from: "https://schema.org/v1" -- make that the context, say that "https://schema.org/" is an alias for "https://schema.org/v1" and note that you will turn off content negotiation for "https://schema.org/" at the beginning of 2020.
If we could have a small snippet of jsonld in our homepage, pointing off to a separate url with our giant big context file, that would be great
Why? Seems like extra complexity... just say that the new schema.org context file is at: https://schema.org/v1 and be done with it. The schema.org context is so large that implementations will ship with it or aggressively cache it. Speaking from our implementation experience, at one point a bug caused us to go out to the web and fetch schema.org for every digital signature we did and our dev environment suffered horribly - massive performance hit. We now ship with static copies of schema.org... we never go out to the network to get the massive context (and that is the way it should be). The only issue, of course, is there is no versioning for schema.org... but we haven't had an issue w/ that yet. We may have an issue when people start digitally signing schema.org content and expecting those signatures to stay valid for 3-5 years while schema.org shifts underneath them.
We are not interested in putting the whole context into our homepage; it is way too big. Similar issues may hold for Wikidata at some point.
Yes, correct, so we don't need the JSON-LD Context processing in HTML documents feature. No one is asking for that feature.
We appreciate the reluctance to entangle the pure json nature of json-ld with html, but note that the success of json-ld was achieved in large part through just such an entangling
I don't understand this statement. There are a number of us that are attempting to make JSON-LD work w/ pure JSON environments in a more harmonious way and have made great strides towards that with the help of JSON-LD 1.1's @protected
feature. @danbri, could you please explain what you meant by the comment above?
The only issue, of course, is there is no versioning for schema.org... but we haven't had an issue w/ that yet.
There sort of is...but it could be better. For instance, all the versions are in a directory on GitHub: https://github.com/schemaorg/schemaorg/tree/master/data/releases
The 3.7 context file (for instance) lives at https://github.com/schemaorg/schemaorg/blob/104238766458b465e6a60cc7d049f887c542563a/data/releases/3.7/schemaorgcontext.jsonld
That's versioned--via git sha's--but not tagged in git (which would help) nor made available as "the 3.7 context file" from the release history page. All of that would help certainly.
From @danbri, posted with permission, after discussion with @gkellogg:
@azaroth42 it would be helpful (if possible) to see more of that thread, or to make this an actual conversation/call (again, if possible). Without it, it's not clear we're all talking about the same thing(s).
@BigBlueHat this was from hallway conversations at the Web Conference, so no thread to refer to. @danbri should clarify his position, but IIRC, they could turn off content-negotiation for http(s)://schema.org and return a stub context in a script tag which references the actual JSON-LD version of the context, which could help their usage. So, for example, the schema.org web page might look something like the following:
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Generated from headtags.tpl -->
<meta charset="utf-8" >
<link rel="shortcut icon" type="image/png" href="docs/favicon.ico"/>
<link rel="stylesheet" type="text/css" href="docs/schemaorg.css" />
<link rel="stylesheet" type="text/css" href="docs/prettify.css" />
...
<script type="application/ld+json">{"@context": "https://schema.org/docs/jsonldcontext.jsonld"}</script>
...
</head>
</html>
Presently, content-negotiation does a redirect to https://schema.org/docs/jsonldcontext.jsonld, so this would simplify their hosting infrastructure.
Right, but it would vastly increase the amount of work a JSON-LD processor must do.
Given this as a data document:
{"@context": "https://schema.org/",
"@type": "Person",
"name": "me"}
The processor (without a cached context it says is valid for https://schema.org/
) would need to...
GET
the default (HTML) response from https://schema.org/
<script type="application/ld+json">
)
GET
the @context
value(s).The processing requirements go from "use an HTTP(S) client" to "use an HTTP(s) client and HTML parser (which possibly supports JavaScript).
There is a massive amount of json-ld embedded within html. Tools without the capability to extract it are ignoring one of the biggest applications of json-ld. So perhaps the burden is not quite so huge?
On Fri, 14 Jun 2019 at 16:54, BigBlueHat notifications@github.com wrote:
Right, but it would vastly increase the amount of work a JSON-LD processor must do.
Given this as a data document:
{"@context": "https://schema.org/", "@type": "Person", "name": "me"}
The processor (without a cached context it says is valid for https://schema.org/) would need to...
- GET the default (HTML) response from https://schema.org/
- Parse that looking for data blocks (i.e. <script type="application/ld+json">)
- with the added requirement that one of them says it's a context file?
- Extract that JSON-LD datablock
- Parse it.
- If valid, GET the @context value(s).
- Parse those to create a single active context for the data document.
The processing requirements go from "use an HTTP(S) client" to "use an HTTP(s) client and HTML parser (which possibly supports JavaScript).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/json-ld-syntax/issues/172?email_source=notifications&email_token=AABJSGKMBJVJIJIX5FIWJ2TP2O5J3A5CNFSM4HK3Y2R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXXGNAY#issuecomment-502163075, or mute the thread https://github.com/notifications/unsubscribe-auth/AABJSGMMHEFSWFIOWEAFWI3P2O5J3ANCNFSM4HK3Y2RQ .
@danbri certainly if you're already in that space doing that thing, you're all set. 😃 But if you're in a "pure" JSON-LD environment (database, IoT, etc), you'd very much want to avoid having higher processing requirements.
Right, but it would vastly increase the amount of work a JSON-LD processor must do.
Given this as a data document:
{"@context": "https://schema.org/", "@type": "Person", "name": "me"}
The processor (without a cached context it says is valid for
https://schema.org/
) would need to...
GET
the default (HTML) response fromhttps://schema.org/
Parse that looking for data blocks (i.e.
<script type="application/ld+json">
)
- with the added requirement that one of them says it's a context file?
- Extract that JSON-LD datablock
- Parse it.
- If valid,
GET
the@context
value(s).- Parse those to create a single active context for the data document.
The processing requirements go from "use an HTTP(S) client" to "use an HTTP(s) client and HTML parser (which possibly supports JavaScript).
Tools really need to cache contexts, anyway, so this might serve as an added incentive to do so.
This issue was discussed in a meeting.
RESOLVED: close #172 as addressed by #204
From this issue in the Verifiable Claims Working Group with regard to the new "full Processor" conformance class: https://github.com/w3c/vc-data-model/issues/585
@gkellogg wrote:
@msporny wrote:
@gkellogg wrote:
I agree that processing JSON-LD content in HTML is a primary use case and the WG should support it.
I disagree that people are publishing JSON-LD Contexts in HTML, that came out of nowhere. I can see what the WG is trying to do, but this issue is an example of my concern: https://github.com/w3c/vc-data-model/issues/585
You have someone suggesting that we pull in a JSON-LD Context file via an HTML document without understanding the technical burden in doing so. They don't understand that publishing a JSON-LD Context as an HTML document will not require full processors.
I also note that expressing JSON-LD Contexts in HTML was not contemplated in any of the input documents to the JSON-LD WG and as such, the group is skirting very close to being in violation of their charter by adding this feature:
https://www.w3.org/2018/03/jsonld-wg-charter.html https://github.com/json-ld/json-ld.org/wiki/Changes-in-Community-Group-Drafts-Targeted-for-1.1 https://json-ld.org/presentations/JSON-LD-Update-TPAC-2017/assets/player/KeynoteDHTMLPlayer.html
There are two major issues with this new set of features:
Making the following changes to the specification would be an improvement: