Support JSON values that aren’t mapped

gkellogg commented 6 years ago

Consider using `”@type”: “@json” to describe native values in the compact form.
Native values should include all JSON types: strings, booleans, numbers, and null as well as objects and arrays.
Expanded form can record these as values of @value.
- Does interfere with some use of [] and {} in framing

Original issue is Support JSON values that aren't mapped #333

azaroth42 commented 6 years ago

:+1: This is a better (IMO) solution for native JSON values such as GeoJSON than requiring every community to map all of their constructs into -LD.

To quote (with slight edits) the example from the original issue:

{
  "@context": {
    "@vocab": "http://example/",
    "@base": "http://example/",
    "json-value": {"@type": "@json"}
  },
  "@id": "foo",
  "json-value": {"native": "json"}
}

This seems very sensible, and fits with our charter. We can later make @json an alias for whatever literal type a future RDF WG might assign for JSON.

akuckartz commented 6 years ago

I would prefer a more LD friendly solution for GeoJSON. #7 ?

azaroth42 commented 6 years ago

@akuckartz I didn't mean to imply that GeoJSON-LD was a bad thing to do, just that if the requirement is "support native JSON data structures in the JSON-LD context", then GeoJSON could be managed that way without then layering on GeoJSON-LD. GeoJSON-LD is great ... but if you don't need to interact with the -LD part of it, just record the JSON structure, there's overhead that could be minimized.

There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1. #7 would additionally let the semantics of the list of lists be expressed.

gkellogg commented 6 years ago

The key is the expanded form; my thought was that the previous example might expand to something like the following:

[{
  "@id": "http://example/foo",
  "http://example/json-value": [{
    "@value": {"native": "json"},
    "@type": "@json"
  }]
}]

Regarding #7, this is not in conflict with a potentially more semweb-y mapping for GeoJSON, but there are other reasons why you might want to preserve raw JSON within JSON-LD.

When turned into RDF, we would need a datatype to describe the value, so that you would get something like the following:

@base <http://example/foo> .
@prefix jsonld: <https://www.w3.org/ns/json-ld#> .

<foo> <json-value> '{"native": "json"}'^^jsonld:json .

Where the JSON is normalized to use minimal whitespace.

iherman commented 6 years ago

I think defining a jsonld:json datatype woukd make a lot if sense at this day and age... and would offer a clean solution.

davidlehn commented 6 years ago

Will need to note that the whole feature is somewhat implementation dependent. Native JSON serialization/deserialization issues may some effect on key ordering, float representation, etc.

davidlehn commented 6 years ago

Should perhaps be jsonld:JSON to better align with https://www.w3.org/TR/rdf11-concepts/#section-html

azaroth42 commented 6 years ago

WG resolved to add a @JSON keyword, mapped to jsonld:JSON to identify the JSON data type.

BigBlueHat commented 6 years ago

I'm concerned this opens a Pandora's box...or maybe several. Sadly, I wasn't here for the call and had overlooked this issue earlier, so I fear I'm only just now raising these concerns...

We're (rather passively) introducing a namespace specific to JSON-LD: https://www.w3.org/ns/json-ld#

We're inviting developers to avoid/ignore the graph model JSON-LD encodes:

{
  "@context": {
    "data": {"@type": "@json"}
  },
  "data": {"everything": "imaginable"}
}

I fear providing this as a "solution for native JSON values such as GeoJSON" sends the wrong message...and it begins to invalidate the reason to have JSON-LD at all (see the example above).

Are we also planning to do this for YAML? Because the use cases would be identical...

cwebber commented 6 years ago

Having implemented the RDF canonicalization spec with a minor headache, this sounds like a full on migraine.

yo dawg I heard you like canonicalization so I put a tree data serialization canonicalization algorithm in your graph data serialization canonicalization algorithm so you can normalize while you normalize

ajs6f commented 6 years ago

@BigBlueHat I appreciate (and to some extent share) that concern, but I wonder if there's a historical analogy: I've not seen the kind of problem you are describing using XML literals within RDF/XML. That may not be a valid analogy, but it's a bit suggestive...

azaroth42 commented 6 years ago

Re YAML, I don't think we would do that, because (a) no one has asked for it and (b) YAML is a non-normative deliverable of how the patterns of JSON-LD could be used in YAML to accomplish the same ends. The charter says: "JSON-LD 1.1 examples specified in YAML" not a normative YAML-LD Rec.

We would be introducing a namespace, yes. We could also (as discussed on the call) add the data type to the RDF namespace, but we at least would need to document it. The consensus was that the creation of a new namespace was less work than putting it into an existing one, and a future RDF WG could take it over down the line.

I agree with @ajs6f about the use of XML literals in RDF/XML. Yes, you can create pointless RDF that simply wraps a single literal in XML or JSON ... but why would you bother to do that? It seems like an enormous waste of your time other than to meet some badly worded RFP.

gkellogg commented 6 years ago

As @ajs6f points out, other RDF syntaxes that leverage languages have a similar mechanism for including raw XML or HTML, this is really no different.

For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.

gkellogg commented 6 years ago

@BigBlueHat worries about introducing a new namespace:

We're (rather passively) introducing a namespace specific to JSON-LD: https://www.w3.org/ns/json-ld#.

In fact, this namespace already exists for URIs such as http://www.w3.org/ns/json-ld#expanded used in HTTP headers.

However, we don't need to use this namespace, and @iherman suggested that we could probably use the RDF namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# and use http://www.w3.org/1999/02/22-rdf-syntax-ns#JSON as the datatype, making it first-class with XMLLiteral and HTML datatypes. Updating the RDF namespace document is something we can do, apparently.

I agree that this no longer serves for GeoJSON, and we should consider some other example, but such examples doubtless exist, which is why this is a compelling feature.

iherman commented 6 years ago

I guess we can all agree that this is (a) technically doable (b) it may require normalization of the literal (at least optionally) and (c) it is not fundamentally different from the XML and HTML datatypes. (E.g., if we do have a standard for RDF canonicalization at some point, that standard must address the issue of literals and their normalization (or not), and the issues raised by @cwebber are also genuine problems for HTML literals.)

However. I guess we are back to our design principles set out at the beginning of the WG's life. We should not do this just because we can; we should have proper use cases, see relevant section. I cannot judge whether GeoJSON is a use case or not.

Fak3 commented 6 years ago

There's a separate issue for the list of lists feature beyond #7 that was already accepted to be part of 1.1.

@azaroth42 is there a github issue for the list of lists support? If not, may I create one?

gkellogg commented 6 years ago

@Fak3 The lists of lists issue is #36, and it was closed as support was added for recursive lists.

cwebber commented 6 years ago

For RDF canonicalization, such values would be treated just as other datatyped literals. Part of the RDF serialization aspects should include whitespace normalization, which is fairly standard in JSON, so I don't really appreciate why things such as RDF Dataset Normalization and signatures would be at any disadvantage.

It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.

There's also a huge risk that people will open this loophole much, much wider than is anticipated, marking giant swaths of content as json-only. Yeah, I guess that's true for XML too, but to be honest no sane person could operate on XML-RDF as if it were real XML and have things survive... it was an RDF serialization format and little more. Here people are actually working with json-ld as if it were normal json and getting reasonable RDF interop. There are pain points occasionally, and we should try to remedy those, but I think this is opening an escape hatch that a good number of people will jump straight through.

Careful about rubbing this lamp... I think fulfilling this wish will have more side effects than anticipated and may undo a lot of the goals of json-ld. -1 from me.

dlongley commented 6 years ago

I share the same concerns as @cwebber and @BigBlueHat.

azaroth42 commented 6 years ago

Re canonicalization (or even just whitespace normalization) ... can someone describe the issue and the risk here? If one implementation serializes to a string "{\"foo\": 1}" and another serializes to a string "{ \"foo\" : 1 }" ... what's the problem? They're not identity providing such that they need to be compared, they're just values.

cwebber commented 6 years ago

@azaroth42 Those would end up being two different signatures with linked data signatures. Without canonicalizing the json exactly the same way every time, LDS will break.

gkellogg commented 6 years ago

@cwebber said:

It isn't that simple. Whitespace is not the only issue. We will probably have to support something like this json canonicalization spec or something. That's a lot of extra work.

Good point, as by the time we see the data, its in a parsed form, and we can't depend on specific representation of numbers, for example.

At this point, I'd say that the work should be put on hold, certainly pending an important use case.

azaroth42 commented 6 years ago

We can defer, but I would like to note that canonicalization and LDS are explicitly out of scope of the WG, per the charter: https://www.w3.org/2018/03/jsonld-wg-charter.html

azaroth42 commented 6 years ago

PROPOSAL: Defer work on JSON literals until specific use cases have been described.

BigBlueHat commented 6 years ago

We can defer, but I would like to note that canonicalization and LDS are explicitly out of scope of the WG, per the charter: https://www.w3.org/2018/03/jsonld-wg-charter.html

They are out of scope, but we should still be careful not to break them.

iherman commented 6 years ago

Thanks @cwebber for the json canonicalization spec link. I did not know about that.

However, that also means that the signature issue with a json literal becomes solvable if and when an RDF canonicalization is formally defined as a standard; that standard should say that literals must be canonicalized and, actually, by defining the json datatype we reinforce interoperability of signatures because there is then a clear follow-your-nose approach of what this means in the standard (provided the aforementioned IETF document becomes final). Indeed, if there are use cases that would make all kinds of ad-hoc inclusion of JSON data in RDF graphs but without a clear spec, a canonicalization spec may not be able to function properly.

Ie, while I agree that we need more use cases per our process, I think a future RDF canonicalization spec can take care of this problem for us, and this WG does not really have to deal with this.

(B.t.w., does the XML Canonicalization spec apply to HTML content? Because if not, there is already a problem with the HTML RDF datatype...)

iherman commented 6 years ago

This issue was discussed in a meeting.

RESOLVED: Defer work on JSON literals until specific use cases have been described
View the transcript
Benjamin Young: https://github.com/w3c/json-ld-syntax/issues/4
Benjamin Young: https://github.com/w3c/json-ld-syntax/issues/4#issuecomment-418857569
Benjamin Young: first issue is number 4. there had been an agreement in past meeting to close but it was brought back up. Further discussion on GitHub and there is now a proposal from Rob to defer. Want to open it back up for further discussion
Proposed resolution: Defer work on JSON literals until specific use cases have been described (Benjamin Young)
Gregg Kellogg: chris weber brought up issue of canonical order of the literal. there is a draft out for JSON canonicalization but it is a burden until algorithms there to do it. This issue is related to the RDF JSON canonicalization. we have this issue with HTML and XML literals.
… we are laking a compelling use case for this
… part was to support geo json but we added list of list to support that
Adam Soroka: to clarify, geo json has no use case for this current issue?
Rob Sanderson: Related to #7 – https://github.com/w3c/json-ld-syntax/issues/7
Gregg Kellogg: list of list allows us to represent geo json but some of the proponents of geo json want to see us go further
Adam Soroka: happy with the proposal and to see conversation to move forward. We owe it to geo json to see that their issues are represented in our work
Rob Sanderson: while I supported the issue, given the lack of use cases, I support the deferral.
David Newbury: I glad it is a deferral and not a close. but when you need to semanticalize and de-semanticalize json it can be a pain.
Ivan Herman: no strong opinion about literals in json but do want to make sure we do not close it for the wrong reason. understand the canonicalization issue but it is not our responsibility. should we have a more broad RDF canonicalization group.
Rob Sanderson: +1 to ivan.
Adam Soroka: +1
David I. Lehn: i’ll just note, that similar to empty terms, raw json would be very helpful for obfuscated json-ld
Benjamin Young: csvw:JSON
Benjamin Young: did more digging and there is another community that minted a json encoding type. the csv working group has this. The json is not native json but rather just a string but has way to say, this is a json string.
… need to make sure we have a compelling use case
Rob Sanderson: +1
Benjamin Young: does anyone want changes?
Proposed resolution: Defer work on JSON literals until specific use cases have been described (Benjamin Young)
David Newbury: +1
Jeff Mixter: +1
Ivan Herman: +1
Adam Soroka: +1
Benjamin Young: +1
Harold Solbrig: +1
Gregg Kellogg: +1
Simon Steyskal: +1
Resolution #2: Defer work on JSON literals until specific use cases have been described
David I. Lehn: +1

azaroth42 commented 5 years ago

Use Case: In the CIDOC-CRM vocabulary, there is a property that takes a Literal for the description of a Place. This (thus) can take a lat,long pair as a string, or an XmlLiteral ... but not GeoJSON as a literal, as there is no JSON datatype. With a JSON data type, and support in JSON-LD for @type:@json in a term definition, we could thus embed the GeoJSON directly in the document, rather than unnecessarily serializing to a string.

iherman commented 5 years ago

This issue was discussed in a meeting.

RESOLVED: Add JSON data type to RDF, with support in JSON-LD processors for managing parsed JSON in the internal form. We will seek feedback via blogpost, and in Berlin. {: #resolution9 .resolution}
View the transcript
JSON Literal
Harold Solbrig: https://github.com/w3c/json-ld-syntax/issues/4
Ivan Herman: to do this, we will have to define an RDF data type for JSON
Ivan Herman: look at RDF concepts document - 3 or 4 lines
Rob Sanderson: https://www.w3.org/TR/rdf11-concepts/#section-html
Gregg Kellogg: what we need to do that those didn’t is to deal with white space…
Ivan Herman: the only thing we have to do is to say there is a string, declared to be JSON
Ivan Herman: it makes cannonicalization difficult so we’d have to find or referenc it
… the fact that there are html data types that makes cannonicalization difficult or impossible
… so if you want to do cannonicalization, html is the big kahuna, and json is secondary
… you may want to answer question about json equality, but is that our responsibility.
Gregg Kellogg: when you parse html or xml, value elements preserve white space. As a result, indentation variations give different literals.
… in json, parsing is done after parser is completed and parsers don’t preserve white space. Serializers can be told no unnecessary white space
… which avoids html and xml problem and allows us to be relatively immune. Ordering may be different but …
… parsers aren’t required to be order preserving, so literals from 2 parsers might not compare. We could state that they appear
… equivalent as objects in testing infrastructure.
… so we just need to state that values aren’t stored without extra white space.
Ivan Herman: not sure that specification should require that when not a question of testing.
Ivan Herman: question is whether it is up to us to define when two pieces of JSON are equal
Ivan Herman: we can be very pragmatic and say that someone else has to take care of this.
Gregg Kellogg: I think we need to say something about this. We need to say something about how that serialization is performed.
… we need to say how you create that string from the objects.
Adam Soroka: consider the case of GeoJSON …
David I. Lehn: is there another solution than complete serialization?
Rob Sanderson: Use case p168 property has range rdf literal with all sorts of possible values, but you can’t add a json literal because there is no id.
… if we want to make it easy for authors to do the right thing, then json as (readable) json is what is needed.
Harold Solbrig: http://build.fhir.org/medicationexample0301.json.html
Harold Solbrig: http://build.fhir.org/medicationexample0301.ttl.html
Rob Sanderson: spec above is completely silent about html canonicalization
… we aren’t adding a new problem, but just another type that doesn’t define canonicalization
Rob Sanderson: can we punt on it and say will be canonicalized when there is a canonicalizatin spec for json?
Ivan Herman: in publishing we put out jsonld not for any reason except it has been accepted by schema.org
… any author who cares about being found by schema.org will put it in jsonld vs. json literal
David Newbury: this will allow people to progressively add semantics vs. having to do everything up front
Benjamin Young: doing this without canonicalization is painfully naive and dangerous. If isn’t consistent between python and js…
… we’re just asking for a world of pain.
Rob Sanderson: question is the extent to which … (if we don’t do complete canonicalization…) … are we doing a better or worse job than nothing?
Gregg Kellogg: coming around to saying it is undefined until spec exists.
Gregg Kellogg: signature specs will need to call out some canonicalization spec or define one that should be used by jsonld processors …
… suggest that canonicalization of jsonld literals is not supported, so can’t sign graphs that contain it…
David I. Lehn: is there a way to know which ones aren’t supported?
Ivan Herman: no because anyone can define a datatype in an RDF graph.
David Newbury: unless codepaths do unicode canonicalization, we’ve still got issues…
Ivan Herman: are there good reasons to do this in jsonld?
Ivan Herman: do we want to have json literals or not?
David I. Lehn: I’m worried about people misusing it but…
Gregg Kellogg: if there WAS a standard for it, we’d say it must be in that form …
Rob Sanderson: there are many terrible things people can do today. I’m highly reluctant to use “it could be misused” as a good reason to not do something.
David Newbury: when we talk about misuse, do we mean security or misuse of RDF in the world (decreasing the amount of semantics in the universe)
Ivan Herman: From a W3C perspective… what we could do is put this into the document w/ a note that says it could be done but we want to have…
… feedback from the community that we are not sure that it is a good idea and seek input.
Gregg Kellogg: we did something like this in spec w/ bnode identifiers and document base url …
Jeff Mixter: reason that WikiData chose JSON rather than JSONLD … they may have opinions about injecting into … document.
Proposed resolution: Add JSON data type to RDF, with support in JSON-LD processors for managing parsed JSON in the internal form. We will seek feedback via blogpost, and in Berlin. (Rob Sanderson)
Ivan Herman: +1
Gregg Kellogg: +1
Rob Sanderson: +1
David Newbury: +1
Jeff Mixter: +1
David I. Lehn: +0.9
Harold Solbrig: +1
Resolution #8: Add JSON data type to RDF, with support in JSON-LD processors for managing parsed JSON in the internal form. We will seek feedback via blogpost, and in Berlin. {: #resolution9 .resolution}

gkellogg commented 5 years ago

For basic interoperability, and to allow implementations to pass tests, I believe we will need to tackle canonicalization. The current version of the draft is https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-05, not sure when it's likely to become an RFC, so don't think we can normatively cite it (although, it looks like it will become an RFC and the core bits aren't too likely to change). We might just need to re-describe the normalization points and informatively reference the draft:

No whitespace between JSON tokens.
Within strings, code points between within the traditional ASCII control character range (U+0000 through U+001F), it MUST be serialized using lowercase hexadecimal Unicode notation (\uhhhh) unless it is in the set of predefined JSON control characters U+0008, U+0009, U+000A, U+000C or U+000D which MUST be serialized as \b, \t, \n, \f and \r respectively. If the Unicode value is outside of the ASCII control character range, it MUST be serialized "as is" unless it is equivalent to U+005C (\) or U+0022 (") which MUST be serialized as \\ and \" respectively.
JSON Number data *MUST be serialized according to Section 7.1.12.1 of [ES6] including the "Note 2" enhancement.
JSON Object properties MUST be sorted in a recursive manner which means that possible JSON child Objects MUST have their properties sorted as well.

There are implementations available for several languages (not Ruby, which I'll work on and contribute).

iherman commented 5 years ago

On https://github.com/w3c/json-ld-syntax/issues/4#issuecomment-474030501: I am not sure, @gkellogg.

As referred to in the comments, the fact of introducing an RDF datatype does not require to define a canonical format. Per RDF1.1 spec:

A datatype consists of a lexical space, a value space and a lexical-to-value mapping, and is denoted by one or more IRIs.

More importantly, there is an example for a datatype (i.e., rdf:HTML) that does not have a canonical version for now. I understand this creates problems and, I presume, this means that an RDF graph containing an HTML (or JSON) literal cannot be properly signed, but the question is whether this creates a problem for the use cases that we have for JSON literals. Similarly, I do not see why you claim

For basic interoperability, [...], I believe we will need to tackle canonicalization.
We may indeed need this for pass tests. But that is a fundamentally different problem, which does not require us to include a recommendation for a canonicalization. We can just refer to the IETF document.
If we really need something normative in our document, we should try to find out what exactly is the status of that document. If it is a stable document (and we can have some sort of a clear statement from somebody of authority at IETF), then we may be able to convince the director to allow us a normative reference. That would be much better than us defining a normative canonicalization that may become in conflict with others.

pchampin commented 5 years ago

@iherman JSON literals in JSON-LD might be more tricky than other literals (including HTML) in RDF. The difference is that, in other RDF serializations, literals have a fixed lexical representation. Unless some form of inference on literals is performed, this lexical representation is kept as is when processing the graph.

In JSON-LD, the JSON literal we get (e.g. in the expansion algorithms) is not a lexical representation, it is the value itself (a JSON object). When converting to triples (and possibly to the expanded form?), a lexical representation has to be built, and there is no unique way to do that. I think that's what @gkellogg means by "basic interoperability problems".

iherman commented 5 years ago

@pchampin

In JSON-LD, the JSON literal we get (e.g. in the expansion algorithms) is not a lexical representation, it is the value itself (a JSON object).

I am not sure that is a given. Going back to the RDF datatype definition: what would exactly be the value space of a JSON Literal? Maybe even more specifically: what is the definition of equality in such a space? The former is something we must define if we define an RDF Datatype. Referring to the canonical representation might be indeed used for the definition of the value space, but that is not necessarily the only way to do that. We could (just from the top of my head) say that a JSON object (or an array thereof) is parsed into a JavaScript object using the JSON parsing rules, and the result is the 'value space'. Which means that two JSON objects are equal if their JavaScript objects are equal per the rules defined by JavaScript. I am not saying that the JavaScript approach is the right one, but it has the value of relying on a standard.

If we have such a clear value space (with equality on it) then

When converting to triples (and possibly to the expanded form?), a lexical representation has to be built, and there is no unique way to do that

is not a problem anymore.

To be clear, I do not have a clear answer. But I am worried about copying an IETF document and making it part of a recommendation. That is what I would prefer to avoid...

gkellogg commented 5 years ago

@cyberphone, can you comment on the status of https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-05? If we are to support JSON literals, it would be best to canonicalize them. When is this expected to become an RFC? How stable is the document? Are there other specs which are normatively referencing the spec?

@iherman part of testing requires an RDF transform and using dataset isomorphism. At that point, the precise lexical representation of JSON literals becomes important. Certainly, this could be left out of the spec, and used in test-suite instructions, but for many reasons, setting on a canonical form for JSON literals is going to be important, if we can overcome the normative citation issues.

gkellogg commented 5 years ago

My Ruby version for JSON canonicalization: https://github.com/dryruby/json-canonicalization.

cyberphone commented 5 years ago

@gkellogg It is great to see a sixth incarnation of the proposal!

Regarding progress the technical issues have (AFAICT...) been properly identified; the problem is rather that a bunch of people still consider canonicalization as pure stupidity. OTOH, it seems that none of the current Open Banking APIs has bought into the Base64Url-concept either.

FWIW, I will do a short presentation https://cyberphone.github.io/ietf-signed-http-requests/hotrfc-shreq.pdf at IETF-104 in Prague which shows how you can apply JCS on a mainstream application.

iherman commented 5 years ago

This issue was discussed in a meeting.

RESOLVED: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized
View the transcript
JSON datatype
Rob Sanderson: link: https://github.com/w3c/json-ld-syntax/issues/4
Rob Sanderson: PR: https://github.com/w3c/json-ld-api/pull/72
Rob Sanderson: we also have discussed the JSON datatype on github
… Gregg, you’ve been the most involved (as always)
… could you summarize?
Gregg Kellogg: the issue comes down to representation
… if you are going to describe both the lexical and value space
… somewhat like HTML
… the lexical space cannon be guaranteed
… the JSON literal quality is lost when its turned into a native representation
… you loose the original key ordering, key escaping, and lexical numerical representations
… so it seems we will need to canonicalize
… which has been referenced in the issue
… it’s sadly not as close to done as I’d hoped
… and we can’t count on it being final in time
… so, do we care if two implementations use the same canonicalization
… so we have done some things about do we use Integer or Doubles for numbers
… so when you’d turn the JSON literal into RDF (in the toRDF space), we do need to say something about that at least
… and the elimination of whitespace
… and the ordering of keys
… I think that can be done
… there’s a lot of detail in that, but we should be able to reference ECMAScript for this
… or we could do it ourselves
Rob Sanderson: last time we talked about the canonicalization issue
… we also talked about HTML being not easily canonicalizable
Gregg Kellogg: HTML is a little different
… they will preserve order, and whitespace
… so you do have the opportunity return to that result
Ivan Herman: well, attribute order and things are not covered
… this would be a problem if you were to attempt to sign an HTML document
Gregg Kellogg: if we weren’t in an era when signatures weren’t as important as they are now, then maybe we wouldn’t need to care about this so much
Rob Sanderson: so, is there a JSON-LD document that could include a JSON “native” data type that also needs to be signed
… so if the only use case is to import GeoJSON
… do we need to worry
Ivan Herman: I have spent time on this issue with others
… aside from the canonicalization problem
… if we do make a native JSON type, we will have to put it into some namespace–rdf: or jsonld:
Rob Sanderson: +1 to RDF namespace
Ivan Herman: if we do that, we’ll have to write the SWIG mailing list, to announce the new datatype, etc.
… we can do this as part of our document
… the other problem is
… I did put a reference in the issue for the rules we have to follow when we point to something normatively
… my first reading is that unfortunately, this JSON canonicalization specification cannot be referred to normatively
… the second problem is bringing our own canonicalization into our document
… if we do that, I can safely say the Director would say no to that
… so, we can’t just take an IETF spec and put it into a W3C spec
… all of these are admin problems
… But I am still not convinced that we need the canonicalization as a normative part of our spec
… we could say that someone else may do this and reference forthcoming work
… but when the issue is that we have a JSON portion we want to store in RDF
… we can state that the only expectation is that [the same processor will produce the same output]
… none of the arguments that I heard is that canonicalization needs to be normative
Pierre-Antoine Champin: http://tinyurl.com/y2gmzxf8
Pierre-Antoine Champin: I was wondering about this example
… there’s an Integer in the non-canonical form
… would that be canonicalized or not?
Gregg Kellogg: yes, that would be canonicalized
… I don’t know any processors that would properly serialize that with a leading zero
… if you’re going to the internal representation
… it is the number 42
… some might do 42.0
… or 42E+0
… that would be fine, but I don’t think most JSON serializers would do that
Pierre-Antoine Champin: for the moment, we know how to sign this thing
Dave Longley: I think this falls into the same category as HTML
… it’s a string in the JSON; it’s not native HTML
… or a native number in the example’s case
… if we’re storing stuff in a string, then store it as a string
… but people want a native JSON object in their JSON
Pierre-Antoine Champin: but if you remove the leading 0 you don’t get the same signature
… so I’m assuming that the signature is dealing with the order or absence of order in the object when signed
… so if the object was a native JSON object, then it would already benefit
… and regardless we already have this problem with other string-expressed literals
Rob Sanderson: if you instead make it value 42.0
… since no one really serializes as 042
… whatever you change here will change the signature
… even though it will canonicalize as something different
Dave Longley: I disagree
Rob Sanderson: what do you disagree with?
Ivan Herman: I think in these examples, the current JSON-LD specification doesn’t say anything about what you put in strings
… we don’t suggest any sort of mini-canonicalization for things like this
… having built-in canonicalization for the native JSON representation
… would be a departure from what we’ve done previously
Dave Longley: my response to all that is that we have very consistent rules about moving non-string data into strings
… so we do have those sorts of specifications
… from a native JSON value into a string
… this same thing would exist for native JSON objects
… for things that come in via a string, those will stay as whatever that string is
… so strings have no issue
… so if you take pchampin’s example, and change it to a real number: 42
Gregg Kellogg: 42, 42.0, 42.0E0, 4.2E+1 are all the same number
Dave Longley: and if you put that in the playground, check the nquads tab, you’ll find the same number
Ivan Herman: yep I acknowledge that
Rob Sanderson: maybe then it’s the playground which is at fault
… I put in several examples, and the signature changes for all of these different 42’s as an integer
Dave Longley: you’re looking at the RSA signature, so you’ll see it change constantly
… because that injects random data
… what you need to look at is the N-Quads or normalized tabs
… the data there stays the same
Gregg Kellogg: this is in the data round tripping section
Gregg Kellogg: so, imo, if we create a datatype for JSON
… before there is a canonicalization for it
… then we’re in danger of doing things too early
… ultimately we need to deal with a canonicalized JSON
Pierre-Antoine Champin: +1
Gregg Kellogg: so the best thing we can do right now is nothing
… and defer this until there is a canonicalized form
… otherwise whitespace, object ordering, etc are all variable
… and the literals really won’t be worth doing any lexical representation is important
… better not to do anything until a canonicalization spec exists
Ivan Herman: my take would be milder
… the GeoJSON example doesn’t care about canonicalization
Rob Sanderson: +1 to ivan
Ivan Herman: with the canonicalization things differed
… and state that this feature is not recommended
… so we differ it, and if/when the canonicalization becomes standard or whatever, then we at that point suggest that that spec gets used
Rob Sanderson: it would be better to have a JSON datatype and state that later we’ll do canonicalization
Dave Longley: let’s provide rules for how to produce the JSON string that match the draft – but that you can do something else and be very clear it’s preferred that everyone do the same thing
Rob Sanderson: so we should start with JSON datatypes, and just suggest that you can’t sign these
Jeff Mixter: +1 to ivan and azaroth
Gregg Kellogg: if we don’t do canonicalization now, we don’t seem to be prevented from doing it later
… if we end up as a living spec, then we could do it that way
… and we could also suggest that for testing purposes it is always canonicalized
Rob Sanderson: a warning or a note?
Proposed resolution: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized (Rob Sanderson)
Rob Sanderson: I’d suggest a warning
Gregg Kellogg: +1
Jeff Mixter: +1
Ivan Herman: +1
Rob Sanderson: +1
Simon Steyskal: +1
Pierre-Antoine Champin: +1
Tim Cole: +1
Dave Longley: +0
Benjamin Young: +0 still have concerns about eager misuse
David I. Lehn: +0.5
Jeff Mixter: I echo bigbluehat concerns but I also have very valid reasons to add JSON to RDF data.
Dave Longley: +1 to everything Benjamin is saying … but that we should really also have JSON literals … but they should also all be converted to the same strings in processors :)
David Newbury: +1
Resolution #3: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized
Dave Longley: JSON literals can be an escape hatch but ONLY an escape hatch.

azaroth42 commented 5 years ago

Agree done, closing :)

w3c / json-ld-syntax

Support JSON values that aren’t mapped #4