w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
112 stars 22 forks source link

Support JSON values that aren’t mapped #4

Closed gkellogg closed 5 years ago

gkellogg commented 6 years ago

Original issue is Support JSON values that aren't mapped #333

azaroth42 commented 6 years ago

:+1: This is a better (IMO) solution for native JSON values such as GeoJSON than requiring every community to map all of their constructs into -LD.

To quote (with slight edits) the example from the original issue:

{
  "@context": {
    "@vocab": "http://example/",
    "@base": "http://example/",
    "json-value": {"@type": "@json"}
  },
  "@id": "foo",
  "json-value": {"native": "json"}
}

This seems very sensible, and fits with our charter. We can later make @json an alias for whatever literal type a future RDF WG might assign for JSON.

akuckartz commented 6 years ago

I would prefer a more LD friendly solution for GeoJSON. #7 ?

azaroth42 commented 6 years ago

@akuckartz I didn't mean to imply that GeoJSON-LD was a bad thing to do, just that if the requirement is "support native JSON data structures in the JSON-LD context", then GeoJSON could be managed that way without then layering on GeoJSON-LD. GeoJSON-LD is great ... but if you don't need to interact with the -LD part of it, just record the JSON structure, there's overhead that could be minimized.

There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1. #7 would additionally let the semantics of the list of lists be expressed.

gkellogg commented 6 years ago

The key is the expanded form; my thought was that the previous example might expand to something like the following:

[{
  "@id": "http://example/foo",
  "http://example/json-value": [{
    "@value": {"native": "json"},
    "@type": "@json"
  }]
}]

Regarding #7, this is not in conflict with a potentially more semweb-y mapping for GeoJSON, but there are other reasons why you might want to preserve raw JSON within JSON-LD.

When turned into RDF, we would need a datatype to describe the value, so that you would get something like the following:

@base <http://example/foo> .
@prefix jsonld: <https://www.w3.org/ns/json-ld#> .

<foo> <json-value> '{"native": "json"}'^^jsonld:json .

Where the JSON is normalized to use minimal whitespace.

iherman commented 6 years ago

I think defining a jsonld:json datatype woukd make a lot if sense at this day and age... and would offer a clean solution.

davidlehn commented 6 years ago

Will need to note that the whole feature is somewhat implementation dependent. Native JSON serialization/deserialization issues may some effect on key ordering, float representation, etc.

davidlehn commented 6 years ago

Should perhaps be jsonld:JSON to better align with https://www.w3.org/TR/rdf11-concepts/#section-html

azaroth42 commented 6 years ago

WG resolved to add a @JSON keyword, mapped to jsonld:JSON to identify the JSON data type.

BigBlueHat commented 6 years ago

I'm concerned this opens a Pandora's box...or maybe several. Sadly, I wasn't here for the call and had overlooked this issue earlier, so I fear I'm only just now raising these concerns...

We're (rather passively) introducing a namespace specific to JSON-LD: https://www.w3.org/ns/json-ld#

We're inviting developers to avoid/ignore the graph model JSON-LD encodes:

{
  "@context": {
    "data": {"@type": "@json"}
  },
  "data": {"everything": "imaginable"}
}

I fear providing this as a "solution for native JSON values such as GeoJSON" sends the wrong message...and it begins to invalidate the reason to have JSON-LD at all (see the example above).

Are we also planning to do this for YAML? Because the use cases would be identical...

cwebber commented 6 years ago

Having implemented the RDF canonicalization spec with a minor headache, this sounds like a full on migraine.

yo dawg I heard you like canonicalization so I put a tree data serialization canonicalization algorithm in your graph data serialization canonicalization algorithm so you can normalize while you normalize

ajs6f commented 6 years ago

@BigBlueHat I appreciate (and to some extent share) that concern, but I wonder if there's a historical analogy: I've not seen the kind of problem you are describing using XML literals within RDF/XML. That may not be a valid analogy, but it's a bit suggestive...

azaroth42 commented 6 years ago

Re YAML, I don't think we would do that, because (a) no one has asked for it and (b) YAML is a non-normative deliverable of how the patterns of JSON-LD could be used in YAML to accomplish the same ends. The charter says: "JSON-LD 1.1 examples specified in YAML" not a normative YAML-LD Rec.

We would be introducing a namespace, yes. We could also (as discussed on the call) add the data type to the RDF namespace, but we at least would need to document it. The consensus was that the creation of a new namespace was less work than putting it into an existing one, and a future RDF WG could take it over down the line.

I agree with @ajs6f about the use of XML literals in RDF/XML. Yes, you can create pointless RDF that simply wraps a single literal in XML or JSON ... but why would you bother to do that? It seems like an enormous waste of your time other than to meet some badly worded RFP.

gkellogg commented 6 years ago

As @ajs6f points out, other RDF syntaxes that leverage languages have a similar mechanism for including raw XML or HTML, this is really no different.

For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.

gkellogg commented 6 years ago

@BigBlueHat worries about introducing a new namespace:

We're (rather passively) introducing a namespace specific to JSON-LD: https://www.w3.org/ns/json-ld#.

In fact, this namespace already exists for URIs such as http://www.w3.org/ns/json-ld#expanded used in HTTP headers.

However, we don't need to use this namespace, and @iherman suggested that we could probably use the RDF namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# and use http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON as the datatype, making it first-class with XMLLiteral and HTML datatypes. Updating the RDF namespace document is something we can do, apparently.

I agree that this no longer serves for GeoJSON, and we should consider some other example, but such examples doubtless exist, which is why this is a compelling feature.

iherman commented 6 years ago

I guess we can all agree that this is (a) technically doable (b) it may require normalization of the literal (at least optionally) and (c) it is not fundamentally different from the XML and HTML datatypes. (E.g., if we do have a standard for RDF canonicalization at some point, that standard must address the issue of literals and their normalization (or not), and the issues raised by @cwebber are also genuine problems for HTML literals.)

However. I guess we are back to our design principles set out at the beginning of the WG's life. We should not do this just because we can; we should have proper use cases, see relevant section. I cannot judge whether GeoJSON is a use case or not.

Fak3 commented 6 years ago

There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1.

@azaroth42 is there a github issue for the list of lists support? If not, may I create one?

gkellogg commented 6 years ago

@Fak3 The lists of lists issue is #36, and it was closed as support was added for recursive lists.

cwebber commented 6 years ago

For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.

It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.

There's also a huge risk that people will open this loophole much, much wider than is anticipated, marking giant swaths of content as json-only. Yeah, I guess that's true for XML too, but to be honest no sane person could operate on XML-RDF as if it were real XML and have things survive... it was an RDF serialization format and little more. Here people are actually working with json-ld as if it were normal json and getting reasonable RDF interop. There are pain points occasionally, and we should try to remedy those, but I think this is opening an escape hatch that a good number of people will jump straight through.

Careful about rubbing this lamp... I think fulfilling this wish will have more side effects than anticipated and may undo a lot of the goals of json-ld. -1 from me.

dlongley commented 6 years ago

I share the same concerns as @cwebber and @BigBlueHat.

azaroth42 commented 6 years ago

Re canonicalization (or even just whitespace normalization) ... can someone describe the issue and the risk here? If one implementation serializes to a string "{\"foo\": 1}" and another serializes to a string "{ \"foo\" : 1 }" ... what's the problem? They're not identity providing such that they need to be compared, they're just values.

cwebber commented 6 years ago

@azaroth42 Those would end up being two different signatures with linked data signatures. Without canonicalizing the json exactly the same way every time, LDS will break.

gkellogg commented 6 years ago

@cwebber said:

It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.

Good point, as by the time we see the data, its in a parsed form, and we can't depend on specific representation of numbers, for example.

At this point, I'd say that the work should be put on hold, certainly pending an important use case.

azaroth42 commented 6 years ago

We can defer, but I would like to note that canonicalization and LDS are explicitly out of scope of the WG, per the charter: https://www.w3.org/2018/03/jsonld-wg-charter.html

azaroth42 commented 6 years ago

PROPOSAL: Defer work on JSON literals until specific use cases have been described.

BigBlueHat commented 6 years ago

We can defer, but I would like to note that canonicalization and LDS are explicitly out of scope of the WG, per the charter: https://www.w3.org/2018/03/jsonld-wg-charter.html

They are out of scope, but we should still be careful not to break them.

iherman commented 6 years ago

Thanks @cwebber for the json canonicalization spec link. I did not know about that.

However, that also means that the signature issue with a json literal becomes solvable if and when an RDF canonicalization is formally defined as a standard; that standard should say that literals must be canonicalized and, actually, by defining the json datatype we reinforce interoperability of signatures because there is then a clear follow-your-nose approach of what this means in the standard (provided the aforementioned IETF document becomes final). Indeed, if there are use cases that would make all kinds of ad-hoc inclusion of JSON data in RDF graphs but without a clear spec, a canonicalization spec may not be able to function properly.

Ie, while I agree that we need more use cases per our process, I think a future RDF canonicalization spec can take care of this problem for us, and this WG does not really have to deal with this.

(B.t.w., does the XML Canonicalization spec apply to HTML content? Because if not, there is already a problem with the HTML RDF datatype...)

iherman commented 6 years ago

This issue was discussed in a meeting.

azaroth42 commented 5 years ago

Use Case: In the CIDOC-CRM vocabulary, there is a property that takes a Literal for the description of a Place. This (thus) can take a lat,long pair as a string, or an XmlLiteral ... but not GeoJSON as a literal, as there is no JSON datatype. With a JSON data type, and support in JSON-LD for @type:@json in a term definition, we could thus embed the GeoJSON directly in the document, rather than unnecessarily serializing to a string.

iherman commented 5 years ago

This issue was discussed in a meeting.

gkellogg commented 5 years ago

For basic interoperability, and to allow implementations to pass tests, I believe we will need to tackle canonicalization. The current version of the draft is https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-05, not sure when it's likely to become an RFC, so don't think we can normatively cite it (although, it looks like it will become an RFC and the core bits aren't too likely to change). We might just need to re-describe the normalization points and informatively reference the draft:

There are implementations available for several languages (not Ruby, which I'll work on and contribute).

iherman commented 5 years ago

On https://github.com/w3c/json-ld-syntax/issues/4#issuecomment-474030501: I am not sure, @gkellogg.

pchampin commented 5 years ago

@iherman JSON literals in JSON-LD might be more tricky than other literals (including HTML) in RDF. The difference is that, in other RDF serializations, literals have a fixed lexical representation. Unless some form of inference on literals is performed, this lexical representation is kept as is when processing the graph.

In JSON-LD, the JSON literal we get (e.g. in the expansion algorithms) is not a lexical representation, it is the value itself (a JSON object). When converting to triples (and possibly to the expanded form?), a lexical representation has to be built, and there is no unique way to do that. I think that's what @gkellogg means by "basic interoperability problems".

iherman commented 5 years ago

@pchampin

In JSON-LD, the JSON literal we get (e.g. in the expansion algorithms) is not a lexical representation, it is the value itself (a JSON object).

I am not sure that is a given. Going back to the RDF datatype definition: what would exactly be the value space of a JSON Literal? Maybe even more specifically: what is the definition of equality in such a space? The former is something we must define if we define an RDF Datatype. Referring to the canonical representation might be indeed used for the definition of the value space, but that is not necessarily the only way to do that. We could (just from the top of my head) say that a JSON object (or an array thereof) is parsed into a JavaScript object using the JSON parsing rules, and the result is the 'value space'. Which means that two JSON objects are equal if their JavaScript objects are equal per the rules defined by JavaScript. I am not saying that the JavaScript approach is the right one, but it has the value of relying on a standard.

If we have such a clear value space (with equality on it) then

When converting to triples (and possibly to the expanded form?), a lexical representation has to be built, and there is no unique way to do that

is not a problem anymore.

To be clear, I do not have a clear answer. But I am worried about copying an IETF document and making it part of a recommendation. That is what I would prefer to avoid...

gkellogg commented 5 years ago

@cyberphone, can you comment on the status of https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-05? If we are to support JSON literals, it would be best to canonicalize them. When is this expected to become an RFC? How stable is the document? Are there other specs which are normatively referencing the spec?

@iherman part of testing requires an RDF transform and using dataset isomorphism. At that point, the precise lexical representation of JSON literals becomes important. Certainly, this could be left out of the spec, and used in test-suite instructions, but for many reasons, setting on a canonical form for JSON literals is going to be important, if we can overcome the normative citation issues.

gkellogg commented 5 years ago

My Ruby version for JSON canonicalization: https://github.com/dryruby/json-canonicalization.

cyberphone commented 5 years ago

@gkellogg It is great to see a sixth incarnation of the proposal!

Regarding progress the technical issues have (AFAICT...) been properly identified; the problem is rather that a bunch of people still consider canonicalization as pure stupidity. OTOH, it seems that none of the current Open Banking APIs has bought into the Base64Url-concept either.

FWIW, I will do a short presentation https://cyberphone.github.io/ietf-signed-http-requests/hotrfc-shreq.pdf at IETF-104 in Prague which shows how you can apply JCS on a mainstream application.

iherman commented 5 years ago

This issue was discussed in a meeting.

azaroth42 commented 5 years ago

Agree done, closing :)