w3c / rdf-concepts

https://w3c.github.io/rdf-concepts/
Other
16 stars 2 forks source link

Moves datatype definitions to an appendix and adds rdf:JSON datatype #62

Closed hartig closed 1 year ago

hartig commented 1 year ago

As agreed during the FSF meeting at TPAC, this PR

  1. adds a new appendix,
  2. moves the sections that define the rdf:HTML and rdf:XMLLiteral datatypes into that appendix, and
  3. copies the definition of the rdf:JSON datatype from Section 10.2 of [JSON_LD11] into this new appendix as well.

Regarding the latter, I have indeed just copied the definition of the rdf:JSON datatype. When doing so, there were two things that I was wondering:

  1. The sentence that defines the lexical space contains two different links for "JSON Grammar". The first one points to Sec.2 of RFC4627, and the second one points to Sec.2 of RFC8259. I was wondering, why these two different links?
  2. There is an issue box about the JSON Canonicalization Scheme (JCS) which I also copied (note, in this copy I had to make a slight change because JSON-LD11 explicitly defines JSON Literal, which we don't have in RDF Concepts). However, I have to admit that I do not fully understand the last sentence in this issue box ("Users are cautioned from depending on the JSON literal lexical representation as an RDF literal, as the specifics of serialization may change in a future revision of this document."). In particular, I wonder:
    1. What "serialization" does the last part of his sentence refer to?
    2. What is meant by "this document"? (the JSON-LD11 spec or the JCS spec?)
    3. Also, is this whole comment something specific to JSON-LD?

(GitHack preview)


Preview | Diff

domel commented 1 year ago

The sentence that defines the lexical space contains two different links for "JSON Grammar". The first one points to Sec.2 of RFC4627, and the second one points to Sec.2 of RFC8259. I was wondering, why these two different links?

I think that's an oversight. RFC8259 obsoleted RFC7159 obsoleted RFC4627. So I see no reason to refer to the old document when the new one is valid.

hartig commented 1 year ago

Looking at @TallTed's change suggestions, I see only now that the diff of this PR contains the complete Section 6 (Fragment Identifiers) and Section 7 (Generalized RDF Triples, Graphs, and Datasets) as new text, and then, later, as removed text snippets. I have not touched these sections at all. It seems that the Github diff functionality did not correctly recognize the actual changes. This is strange in particular because, when I did git diff locally before committing the changes, the diff looked correct. Hence, the diff functionality of git recognized the changes correctly, in contrast to the one of Github.

Long story short, at least half of @TallTed's change suggestion in this PR now are for text that was not supposed to be changed at all by this PR. Therefore, I am a bit unsure what to do with these suggestions. My idea is the following. I will merge right away the suggestions for the parts of the text that this PR was meant to change. Regarding the suggestions for the other parts (i.e., in Sections 6 and 7), I will keep them open for a few days and, unless I hear otherwise from any of the other editors, will then merge them later (mid next week).

TallTed commented 1 year ago

[@hartig] at least half of @TallTed's change suggestion in this PR now are for text that was not supposed to be changed at all by this PR

[@gkellogg] @TallTed's comments should not be lost, but probably belong in their own PR.

I have no objection to those comments being moved to a different PR (with or without an issue, as I don't think those comments are controversial). It would be super if you could create that PR, as I'm not certain which comments you mean, and I don't want to lose any.

hartig commented 1 year ago

[@hartig] at least half of @TallTed's change suggestion in this PR now are for text that was not supposed to be changed at all by this PR

[@gkellogg] @TallTed's comments should not be lost, but probably belong in their own PR.

[@TallTed] I have no objection to those comments being moved to a different PR (with or without an issue, as I don't think those comments are controversial). It would be super if you could create that PR, as I'm not certain which comments you mean, and I don't want to lose any.

Since all of these small changes have been minor (mostly missing commas, correct use of plural vs singular, etc), I don't think it is worth the effort creating a separate PR and copying them all over. Therefore, I have merged them into this PR now.

pfps commented 1 year ago

The rdf:JSON datatype being added here appears to be very different from the rdf:JSON datatype in https://www.w3.org/TR/json-ld/#the-rdf-json-datatype

hartig commented 1 year ago

The rdf:JSON datatype being added here appears to be very different from the rdf:JSON datatype in https://www.w3.org/TR/json-ld/#the-rdf-json-datatype

I see that too now. Just to clarify: my original changes regarding the rdf:JSON datatype in this PR were only to insert an almost verbatim copy of the text of https://www.w3.org/TR/json-ld/#the-rdf-json-datatype into the rdf:JSON section in this document (rdf-concepts). Yet, it seems that @gkellogg's commits to this PR have edited the section quite a bit.

pfps commented 1 year ago

There is no section 24.5 in the current ECMASCRIPT document.

afs commented 1 year ago

it seems that @gkellogg's commits to this PR have edited the section quite a bit.

For reference - commit 8b5d09c56d and later commits. The <a>string</a> reference changes meaning.

That also makes other changes e.g. i18n - probably inconsequential but mixed in.

afs commented 1 year ago

Why is the value space not the RFC8785 canonical mapping?

gkellogg commented 1 year ago

We discussed at TPAC that the value space of rdf:JSON being the canonical string representation was ill-considered by the JSON-LD WG, and the value space really should be a JSON value (reference may need correcting, number, string, boolean, null, array, or map). That is the motivation behind these changes.

Indeed, "string" should probably reference https://infra.spec.whatwg.org/#string rather than our own RDF string.

afs commented 1 year ago

We discussed at TPAC that the value space of rdf:JSON being the canonical string representation was ill-considered by the JSON-LD WG and the value space really should be a JSON value (

Whose "we" because the RDF-star minutes don't reflect that.

Don't we then deviate from the canonical form algorithm? (is there variance in the RFC canonical form for the same JSON structure?)

gkellogg commented 1 year ago

We discussed at TPAC that the value space of rdf:JSON being the canonical string representation was ill-considered by the JSON-LD WG and the value space really should be a JSON value (

Whose "we" because the RDF-star minutes don't reflect that.

I recall the discussion, both at TPAC, and from @pfps comments before. I had suggested that we "correct" the value space based on this notion. Of course, we can revert these changes and go back to the JSON-LD wording if there's not consensus. However, it was my understanding that using the JSON Canonicalization Scheme as the value space was wrong.

Don't we then deviate from the canonical form algorithm? (is there variance in the RFC canonical form for the same JSON structure?)

We don't deviate (intentionally) from the canonical form defined by JCS/RFC8785.

afs commented 1 year ago

from @pfps comments before

Links please. I don't see anything on #14.

Let's execute on this PR as originally given - which is editorial because the adoption of text from JSON-LD11 was already signallled.

That's "editorial". It can have all the non-rdf:JSON items thet have also been added.

Then a second PR which is "substantive", "needs discussion".

I am finding it very hard to get a detailed understanding of the substantive change on a fragmented PR.

While I'm not that enthusiastic about defining the a datatype value space based on canonical forms generally, it does work, uses the JSON community RFC 8785, and, for rdf:JSON, has been published.

I think that a new data model in RDF concepts for JSON has long term risks of differences to definitions elsewhere. I would like to see evidence that it does match the RFC 8785 canonicalization.

RFC 8259 (defn of JSON) does define an abstract syntax (parse tree) for JSON which could be considered a data model.

gkellogg commented 1 year ago

@afs said:

I think that a new data model in RDF concepts for JSON has long term risks of differences to definitions elsewhere. I would like to see evidence that it does match the RFC 8785 canonicalization.

I reverted the changes to the rdf:JSON value space (although it does need to be updated to address the issue on using JCS, at least), and preserved the original commits for another PR. See 39cdf497f0fe9c03fcdaea49e2f3f9b3bf0b9448.

As I said in the commit, the value space could be re-defined (without going to an INFRA representation) by being an RDF string representation conforming to the formatting requirements of the JSON Canonicalization Scheme [RFC8785].

Given differences in recollection of discussions, and likely more pressing matters, the WG should consider next steps. See #65.

pfps commented 1 year ago

Some minor comments on rdf:JSON datatype (definition mostly copied from JSON-LD). This is not an endorsement of the value space, as opposed to either having the value space just be the lexical space or having the value space being some sort of JSON values.

The value space should not have SHOULD. The value space is what it is.

The value space is defined as a set of RDF strings and then later claimed to be distinct from the value space of xsd:string. It would be better to state at the beginning that the value space is separate from the value space of xsd:string. In any case, JSON texts can include surrogate code points so the value space cannot be just RDF strings (assuming that surrogate code points stay out of RDF strings).

Lexicographic order needs to be very carefully defined. What is the underlying ordering?

What happens if there are duplicate keys in an array?

The lexical-to-value mapping uses ECMASCRIPT and thus coerces JSON numbers to ECMASCRIPT floating point. Why impose this limitation?

JSON has its own string escapes so it is possible to have double escaping in RDF serializations, as in '"\xDEAD"'. I think this is going to cause problems.

hartig commented 1 year ago

Just for the record I am happy with his PR now (I mention this because, as creator of it, I cannot mark it as approved). After merging it, there can be another PR to edit the section about the rdf:JSON datatype.

hartig commented 1 year ago

@domel I think all (or almost all) of your recent comments are for an outdated version of this PR. In particular, they are for changes that @gkellogg had made in this PR which were later reverted (to be proposed in a new PR after this one here is merged).

domel commented 1 year ago

@domel I think all (or almost all) of your recent comments are for an outdated version of this PR. In particular, they are for changes that @gkellogg had made in this PR which were later reverted (to be proposed in a new PR after this one here is merged).

Yes, the last two I removed in the last commit.

hartig commented 1 year ago

I will merge this one now to make way for @gkellogg to create the follow-up PR with his proposed changes to the section about the rdf:JSON datatype.