Closed gkellogg closed 5 years ago
:+1: This is a better (IMO) solution for native JSON values such as GeoJSON than requiring every community to map all of their constructs into -LD.
To quote (with slight edits) the example from the original issue:
{
"@context": {
"@vocab": "http://example/",
"@base": "http://example/",
"json-value": {"@type": "@json"}
},
"@id": "foo",
"json-value": {"native": "json"}
}
This seems very sensible, and fits with our charter. We can later make @json
an alias for whatever literal type a future RDF WG might assign for JSON.
I would prefer a more LD friendly solution for GeoJSON. #7 ?
@akuckartz I didn't mean to imply that GeoJSON-LD was a bad thing to do, just that if the requirement is "support native JSON data structures in the JSON-LD context", then GeoJSON could be managed that way without then layering on GeoJSON-LD. GeoJSON-LD is great ... but if you don't need to interact with the -LD part of it, just record the JSON structure, there's overhead that could be minimized.
There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1. #7 would additionally let the semantics of the list of lists be expressed.
The key is the expanded form; my thought was that the previous example might expand to something like the following:
[{
"@id": "http://example/foo",
"http://example/json-value": [{
"@value": {"native": "json"},
"@type": "@json"
}]
}]
Regarding #7, this is not in conflict with a potentially more semweb-y mapping for GeoJSON, but there are other reasons why you might want to preserve raw JSON within JSON-LD.
When turned into RDF, we would need a datatype to describe the value, so that you would get something like the following:
@base <http://example/foo> .
@prefix jsonld: <https://www.w3.org/ns/json-ld#> .
<foo> <json-value> '{"native": "json"}'^^jsonld:json .
Where the JSON is normalized to use minimal whitespace.
I think defining a jsonld:json datatype woukd make a lot if sense at this day and age... and would offer a clean solution.
Will need to note that the whole feature is somewhat implementation dependent. Native JSON serialization/deserialization issues may some effect on key ordering, float representation, etc.
Should perhaps be jsonld:JSON
to better align with https://www.w3.org/TR/rdf11-concepts/#section-html
WG resolved to add a @JSON keyword, mapped to jsonld:JSON to identify the JSON data type.
I'm concerned this opens a Pandora's box...or maybe several. Sadly, I wasn't here for the call and had overlooked this issue earlier, so I fear I'm only just now raising these concerns...
We're (rather passively) introducing a namespace specific to JSON-LD: https://www.w3.org/ns/json-ld#
We're inviting developers to avoid/ignore the graph model JSON-LD encodes:
{
"@context": {
"data": {"@type": "@json"}
},
"data": {"everything": "imaginable"}
}
I fear providing this as a "solution for native JSON values such as GeoJSON" sends the wrong message...and it begins to invalidate the reason to have JSON-LD at all (see the example above).
Are we also planning to do this for YAML? Because the use cases would be identical...
Having implemented the RDF canonicalization spec with a minor headache, this sounds like a full on migraine.
yo dawg I heard you like canonicalization so I put a tree data serialization canonicalization algorithm in your graph data serialization canonicalization algorithm so you can normalize while you normalize
@BigBlueHat I appreciate (and to some extent share) that concern, but I wonder if there's a historical analogy: I've not seen the kind of problem you are describing using XML literals within RDF/XML. That may not be a valid analogy, but it's a bit suggestive...
Re YAML, I don't think we would do that, because (a) no one has asked for it and (b) YAML is a non-normative deliverable of how the patterns of JSON-LD could be used in YAML to accomplish the same ends. The charter says: "JSON-LD 1.1 examples specified in YAML" not a normative YAML-LD Rec.
We would be introducing a namespace, yes. We could also (as discussed on the call) add the data type to the RDF namespace, but we at least would need to document it. The consensus was that the creation of a new namespace was less work than putting it into an existing one, and a future RDF WG could take it over down the line.
I agree with @ajs6f about the use of XML literals in RDF/XML. Yes, you can create pointless RDF that simply wraps a single literal in XML or JSON ... but why would you bother to do that? It seems like an enormous waste of your time other than to meet some badly worded RFP.
As @ajs6f points out, other RDF syntaxes that leverage languages have a similar mechanism for including raw XML or HTML, this is really no different.
For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.
@BigBlueHat worries about introducing a new namespace:
We're (rather passively) introducing a namespace specific to JSON-LD:
https://www.w3.org/ns/json-ld#
.
In fact, this namespace already exists for URIs such as http://www.w3.org/ns/json-ld#expanded
used in HTTP headers.
However, we don't need to use this namespace, and @iherman suggested that we could probably use the RDF namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#
and use http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON
as the datatype, making it first-class with XMLLiteral
and HTML
datatypes. Updating the RDF namespace document is something we can do, apparently.
I agree that this no longer serves for GeoJSON, and we should consider some other example, but such examples doubtless exist, which is why this is a compelling feature.
I guess we can all agree that this is (a) technically doable (b) it may require normalization of the literal (at least optionally) and (c) it is not fundamentally different from the XML and HTML datatypes. (E.g., if we do have a standard for RDF canonicalization at some point, that standard must address the issue of literals and their normalization (or not), and the issues raised by @cwebber are also genuine problems for HTML literals.)
However. I guess we are back to our design principles set out at the beginning of the WG's life. We should not do this just because we can; we should have proper use cases, see relevant section. I cannot judge whether GeoJSON is a use case or not.
There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1.
@azaroth42 is there a github issue for the list of lists support? If not, may I create one?
@Fak3 The lists of lists issue is #36, and it was closed as support was added for recursive lists.
For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.
It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.
There's also a huge risk that people will open this loophole much, much wider than is anticipated, marking giant swaths of content as json-only. Yeah, I guess that's true for XML too, but to be honest no sane person could operate on XML-RDF as if it were real XML and have things survive... it was an RDF serialization format and little more. Here people are actually working with json-ld as if it were normal json and getting reasonable RDF interop. There are pain points occasionally, and we should try to remedy those, but I think this is opening an escape hatch that a good number of people will jump straight through.
Careful about rubbing this lamp... I think fulfilling this wish will have more side effects than anticipated and may undo a lot of the goals of json-ld. -1 from me.
I share the same concerns as @cwebber and @BigBlueHat.
Re canonicalization (or even just whitespace normalization) ... can someone describe the issue and the risk here? If one implementation serializes to a string "{\"foo\": 1}"
and another serializes to a string "{ \"foo\" : 1 }"
... what's the problem? They're not identity providing such that they need to be compared, they're just values.
@azaroth42 Those would end up being two different signatures with linked data signatures. Without canonicalizing the json exactly the same way every time, LDS will break.
@cwebber said:
It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.
Good point, as by the time we see the data, its in a parsed form, and we can't depend on specific representation of numbers, for example.
At this point, I'd say that the work should be put on hold, certainly pending an important use case.
We can defer, but I would like to note that canonicalization and LDS are explicitly out of scope of the WG, per the charter: https://www.w3.org/2018/03/jsonld-wg-charter.html
PROPOSAL: Defer work on JSON literals until specific use cases have been described.
We can defer, but I would like to note that canonicalization and LDS are explicitly out of scope of the WG, per the charter: https://www.w3.org/2018/03/jsonld-wg-charter.html
They are out of scope, but we should still be careful not to break them.
Thanks @cwebber for the json canonicalization spec link. I did not know about that.
However, that also means that the signature issue with a json literal becomes solvable if and when an RDF canonicalization is formally defined as a standard; that standard should say that literals must be canonicalized and, actually, by defining the json datatype we reinforce interoperability of signatures because there is then a clear follow-your-nose approach of what this means in the standard (provided the aforementioned IETF document becomes final). Indeed, if there are use cases that would make all kinds of ad-hoc inclusion of JSON data in RDF graphs but without a clear spec, a canonicalization spec may not be able to function properly.
Ie, while I agree that we need more use cases per our process, I think a future RDF canonicalization spec can take care of this problem for us, and this WG does not really have to deal with this.
(B.t.w., does the XML Canonicalization spec apply to HTML content? Because if not, there is already a problem with the HTML RDF datatype...)
This issue was discussed in a meeting.
RESOLVED: Defer work on JSON literals until specific use cases have been described
Use Case: In the CIDOC-CRM vocabulary, there is a property that takes a Literal for the description of a Place. This (thus) can take a lat,long pair as a string, or an XmlLiteral ... but not GeoJSON as a literal, as there is no JSON datatype. With a JSON data type, and support in JSON-LD for @type:@json
in a term definition, we could thus embed the GeoJSON directly in the document, rather than unnecessarily serializing to a string.
This issue was discussed in a meeting.
RESOLVED: Add JSON data type to RDF, with support in JSON-LD processors for managing parsed JSON in the internal form. We will seek feedback via blogpost, and in Berlin. {: #resolution9 .resolution}
For basic interoperability, and to allow implementations to pass tests, I believe we will need to tackle canonicalization. The current version of the draft is https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-05, not sure when it's likely to become an RFC, so don't think we can normatively cite it (although, it looks like it will become an RFC and the core bits aren't too likely to change). We might just need to re-describe the normalization points and informatively reference the draft:
U+0000
through U+001F
), it MUST be serialized
using lowercase hexadecimal Unicode notation (\uhhhh
) unless it is
in the set of predefined JSON control characters U+0008
, U+0009
,
U+000A
, U+000C
or U+000D
which MUST be serialized as \b
, \t
, \n
,
\f
and \r
respectively. If the Unicode value is outside of the ASCII control character
range, it MUST be serialized "as is" unless it is equivalent to
U+005C
(\
) or U+0022
("
) which MUST be serialized as \\
and \"
respectively.There are implementations available for several languages (not Ruby, which I'll work on and contribute).
On https://github.com/w3c/json-ld-syntax/issues/4#issuecomment-474030501: I am not sure, @gkellogg.
As referred to in the comments, the fact of introducing an RDF datatype does not require to define a canonical format. Per RDF1.1 spec:
A datatype consists of a lexical space, a value space and a lexical-to-value mapping, and is denoted by one or more IRIs.
More importantly, there is an example for a datatype (i.e., rdf:HTML) that does not have a canonical version for now. I understand this creates problems and, I presume, this means that an RDF graph containing an HTML (or JSON) literal cannot be properly signed, but the question is whether this creates a problem for the use cases that we have for JSON literals. Similarly, I do not see why you claim
For basic interoperability, [...], I believe we will need to tackle canonicalization.
@iherman JSON literals in JSON-LD might be more tricky than other literals (including HTML) in RDF. The difference is that, in other RDF serializations, literals have a fixed lexical representation. Unless some form of inference on literals is performed, this lexical representation is kept as is when processing the graph.
In JSON-LD, the JSON literal we get (e.g. in the expansion algorithms) is not a lexical representation, it is the value itself (a JSON object). When converting to triples (and possibly to the expanded form?), a lexical representation has to be built, and there is no unique way to do that. I think that's what @gkellogg means by "basic interoperability problems".
@pchampin
In JSON-LD, the JSON literal we get (e.g. in the expansion algorithms) is not a lexical representation, it is the value itself (a JSON object).
I am not sure that is a given. Going back to the RDF datatype definition: what would exactly be the value space of a JSON Literal? Maybe even more specifically: what is the definition of equality in such a space? The former is something we must define if we define an RDF Datatype. Referring to the canonical representation might be indeed used for the definition of the value space, but that is not necessarily the only way to do that. We could (just from the top of my head) say that a JSON object (or an array thereof) is parsed into a JavaScript object using the JSON parsing rules, and the result is the 'value space'. Which means that two JSON objects are equal if their JavaScript objects are equal per the rules defined by JavaScript. I am not saying that the JavaScript approach is the right one, but it has the value of relying on a standard.
If we have such a clear value space (with equality on it) then
When converting to triples (and possibly to the expanded form?), a lexical representation has to be built, and there is no unique way to do that
is not a problem anymore.
To be clear, I do not have a clear answer. But I am worried about copying an IETF document and making it part of a recommendation. That is what I would prefer to avoid...
@cyberphone, can you comment on the status of https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-05? If we are to support JSON literals, it would be best to canonicalize them. When is this expected to become an RFC? How stable is the document? Are there other specs which are normatively referencing the spec?
@iherman part of testing requires an RDF transform and using dataset isomorphism. At that point, the precise lexical representation of JSON literals becomes important. Certainly, this could be left out of the spec, and used in test-suite instructions, but for many reasons, setting on a canonical form for JSON literals is going to be important, if we can overcome the normative citation issues.
My Ruby version for JSON canonicalization: https://github.com/dryruby/json-canonicalization.
@gkellogg It is great to see a sixth incarnation of the proposal!
Regarding progress the technical issues have (AFAICT...) been properly identified; the problem is rather that a bunch of people still consider canonicalization as pure stupidity. OTOH, it seems that none of the current Open Banking APIs has bought into the Base64Url-concept either.
FWIW, I will do a short presentation https://cyberphone.github.io/ietf-signed-http-requests/hotrfc-shreq.pdf at IETF-104 in Prague which shows how you can apply JCS on a mainstream application.
This issue was discussed in a meeting.
RESOLVED: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized
Agree done, closing :)
@value
.[]
and{}
in framingOriginal issue is Support JSON values that aren't mapped #333