json-ld / json-ld.org

JSON for Linked Data's documentation and playground site
https://json-ld.org/
Other
852 stars 152 forks source link

Charles Greer's JSON-LD syntax spec review #230

Closed lanthaler closed 11 years ago

lanthaler commented 11 years ago

_@grechaw JSON-LD syntax spec review:_

Hi all,

This email is a review of JSON-LD-SYNTAX as of 3/13/2013

https://dvcs.w3.org/hg/json-ld/raw-file/default/spec/latest/json-ld-syntax/index.html#data-indexing

Overall:

The document presents the syntax in a reasonably clear way. The one exception to this is the intersection of terms, absolute IRIs, compact IRIs and relative IRIs. In particular one might wish to rethink the use of relative IRIs in here at all, they seem to be confusing or problematic every time they come up, and don't seem to add to JSON-LD in any significant way. I've noted these places below.

Until I came to flattening, I thought that JSON-LD was subject to a lot of the same problems as RDF/XML. My concern had to do with manipulating structures as JSON - if there are a lot of ways to represent something, then one gets into a lot of issues with finding data within the structure. Flattening seems to get rid of most of those concerns - it should probably be foregrounded as a good canonical representation if you can go that far.

Otherwise this review is mainly editorial nits:

The Nits:

"a way to disambiguate the keys used between multiple JSON documents by mapping them to IRIs via a context,"

1.1 I think this characterization of JSON-LD is incorrect: "a serialization of Linked Data in JSON."

6 Advanced Concepts

" property which explicitly represents an ordered list (with the @container key)"

Section 6.6

Appendix A

Appendix B

I appreciate all of the examples in Appendix D a lot.

That wraps up what I've to say overall. It was a pleasure to review this document.

Charles

lanthaler commented 11 years ago

_My response explaining the various changes in 0111ecb:_

On Wednesday, March 13, 2013 7:38 PM, Charles Greer wrote:

Overall:

The document presents the syntax in a reasonably clear way. The one exception to this is the intersection of terms, absolute IRIs, compact IRIs and relative IRIs.

Yes, that's tricky to explain properly. I've clarified the grammar section (before I got your review) which now clearly states where terms/abs./rel./comp. IRIs are allowed.

Until I came to flattening, I thought that JSON-LD was subject to a lot of the same problems as RDF/XML. My concern had to do with manipulating structures as JSON - if there are a lot of ways to represent something, then one gets into a lot of issues with finding data within the structure. Flattening seems to get rid of most of those concerns - it should probably be foregrounded as a good canonical representation if you can go that far.

I completely agree. I've added a section similar to expanded/compacted document form.

Otherwise this review is mainly editorial nits:

The Nits:

"a way to disambiguate the keys used between multiple JSON documents "by mapping them to IRIs via a context,

"keys used between sounds awkward to me (conflates identify with "reference) how about shared among?

Thanks, fixed.

1.1 I think this characterization of JSON-LD is incorrect: "a serialization of Linked Data in JSON." From what I'm reading, JSON- LD is a method for encoding linked data within JSON documents and generating RDF from them. While it's possible to create JSON-LD documents that are serializations of linked data, the focus of this document presents JSON-LD as a superset of RDF. Many things about JSON-LD rely on document scope, and a JSON-LD can contain much more than just the RDF within. You've probably gone over this point many times before, but JSON-LD seems to be much more about authoring or incrementally creating Linked-Data-ready JSON than it is about writing out Linked Data as JSON.

Not sure what to do with this. Do you have something concrete in mind I could use instead?

  1. Design Goals Expressiveness: Repetitive use of 'to be able to express.' You'll want to reword one of those. My sense is that syntax expresses a graph, but graphs don't express a data model.

Changed to:

"The syntax must be able to serialize directed graphs. This ensures that almost every real world data model can be expressed."

Zero-edits You have a missing reference "(see )."

Fixed.

  1. Basic concepts A note on 'serialization' above -- dereferencing contexts make JSON-LD really different from other serializations of RDF. Perhaps that's why you've shied away from the term "RDF." Maybe only documents that are fully expanded/dereferenced actually conform to RDF. It means that without the ability to dereference a context, the JSON-LD document has different data in it than it would were the context fully realized.

Obviously, if the context changes, the data changes as well. I wouldn't go as far as saying that only expanded JSON-LD conforms to RDF. The situation is similar to RDFa which has some predefined prefixes.

5.2 I find the introduction of relative IRIs disorienting here. It's taken up later in the document, but not completely; this paragraph has the only mention of "base IRI" in the document, and the reference to 'directory path' seems to just muddy the issue further. In general the interaction between relative IRIs and other terms seems to be a difficult part of this document to understand. As an example, it would seem that using @vocab would rid a document of relative IRIs -- you might want to state that explicitly as a # 5 at the end of this section "unmatched terms are relative IRIs"

I removed the "directory path" fragment and there's also a new example showing how a relative IRI might be used. The grammar section makes it clear where relative IRIs can be used. Furthermore, there's now a section Base IRI which references RFC3986 and explains the @base keyword and a section Default Vocabulary explaining @vocab.

Does this address your concerns?

6 Advanced Concepts

On Compact IRIs, it surprises me that this is part of the normative section. I can see why it is, but nonetheless it might be useful to point out why a separate syntax is part of this document, as opposed to an updated version of CURIE. (Please disregard this comment if I'm being silly).

Simply speaking, in JSON-LD there are no restrictions at all except that, by definition, the prefix cannot contain a colon (terms can but they will never be selected as prefixes as they won't match anything).

If a prefix:suffix pattern is not matched in the context, is it a relative IRI? (in 6.3 this is prohibited - we have a hole)

No, an absolute IRI -- that's also what the current text says btw. :-)

6.2 "native JSON type such as number, true, or false." Shouldn't this read "number or boolean" true and false aren't types but values.

In JSON there's no boolean type but there are just the two values true and false. Don't ask me why.. Why just reused the language used in RFC 4627.

"A value type specifies the unit of measurement This wording seems "wrong. A date isn't a unit of measurement but it's still a range. I "can't think of a better way of putting this though. Also, I've never "thought of 'meters' as a value type. I'd use a decimal-typed number "to represent meters. Something is wrong with this notion.

Changed to:

"A value type specifies the data type of a particular value, such as an integer, a floating point number, or a date."

6.3 You mention correctly that the homepage property is ordered in example 21. It reads strangely because there's no mention yet in the doc about how to order items. Just parenthetically mentioning @list would help:

" property which explicitly represents an ordered list (with the " @container key)"

Good spot. I removed @list from this example. As you noted, it is introduced in detail later.

6.4 "last-defined-wins mechanism." This looks more like a "most recently defined" mechanism, because of nested scopes. I could be misinterpreting "last-defined-wins" though.

I, as a non-native speaker, can't really see a difference. It's not the temporally last (which most recently would suggest to me) but the "closest" one if you look from the current element towards the tree's root.

6.5 application/ld+json is introduced in a slightly jarring way. Moreover, there's a MUST stipulation attached to its usage, but later in the document its usage is MAY identify a node. I'm just confused by this paragraph.

You are referring to this sentence:

"Please note that JSON-LD documents served with the application/ld+json media type MUST have all context information, including references to external contexts, within the body of the document. Contexts linked via a http://www.w3.org/ns/json-ld#context HTTP Link Header MUST be ignored for such documents."

I don't understand what you mean by "later in the document its usage is MAY identify a node". The intention of this paragraph is to say that, if a document is server as application/ld+json the context must be referenced from within the document and not via a HTTP Link header. In other words, if you want to use the link header, you must serve the document as application/json.

Does use of @language in the context mean that it will be applied to ALL strings in the document? It looks like yes. I'd put a big warning on this; it's risky to assume.

Yes. I've added the following sentence:

"The default language applies to all string values that are not type coerced."

6.6 Example 29 provides a method for identifying languages within key names. I see why this works, but you might consider removing it to encourage more uniform language-tagging practice. In other words, I'd prefer to see just "occupation" as a key with the @container method. I'm uncomfortable with so many ways to handle language tags, even though what you've got is internally consistent.

This is how most multi-lingual JSON is currently expressed. The advantage of doing it this way is that you can access the desired language directly (doc.occupation.en for the English string) instead of having to filter the occupation array. It's always a tradeoff, but we believe that the API is powerful enough to deal with this. If you prefer to have just the occupation as key, just expand the document and re-compact it with a context that doesn't use the container.

See http://bit.ly/ZAKzLm for a live example.

Note -- "Language associations can only be applied to plain literal strings. Typed values or values that are subject to 6.3 Type Coercion cannot be language tagged." Does this mean that these invalid language keys are ignored or raise an error?

Clarified as follows:

"Language associations are only applied to plain strings. Typed values or values that are subject to 6.5 Type Coercion are not language tagged."

6.14 Expanded Document Form and 6.15 compact form. in api doc these are non normative. Perhaps you don't mean that the API doc defines them, just refers to them?

The text says:

"The JSON-LD Processing Algorithms and API specification [JSON-LD-API] defines a method for expanding"

The API spec defines (normatively) the algorithms to expand/compact documents. The result of those algorithms are documents in expanded/compacted document form.

Appendix A I don't think a JSON-LD document serializes a collection of graphs. Maybe you can define a subset of JSON-LD that does, however.

Well, it serializes a RDF Dataset which is defined as "a collection of RDF graphs" in RDF Concepts. Why do you think it doesn't

Restrictions on JSON-LD that make it serialized RDF might also help with document identity/signing (no references to external contexts, no blank node identifiers as graph names)

Just for my own edification, why MUST NOT? "A JSON-LD graph must not contain unconnected nodes, i.e., nodes which are not connected by an edge to any other node."

That was added due to feedback from the RDF WG (sorry, can't remember who exactly it was). RDF doesn't allow free-floating nodes, they are not expressing anything (except that the node exists which isn't really informative given the OWA), and so we added a MUST NOT to the data model. In fact, free-floating nodes are dropped during processing.

"A blank node is a node... neither, nor, or. There's some unclear "parallelism among these prepositions.

Changed it to: neither, nor, nor -- not sure however if that's correct. Native speakers?

In Issue 217 box, please remove 'controversial' in favor or a less controversial word.

:-) Changed to:

"Thus, some data that is valid JSON-LD cannot be converted to RDF. This feature may be removed in the future."

"JSON-LD documents may contain data that cannot be represented by the "data model defined above. Unless otherwise specified, such data is "ignored when a JSON-LD document is being processed. This means, e.g., "that properties which are not mapped to an IRI or blank node will be "ignored. This statement seems to allow for nodes without edges, but "I guess the point is you won't know they're nodes in that case?

This statements means that you can put data in your JSON-LD document that can't be represented in the data model defined above. For example edges (properties) that are just strings. Such things are ignored when being processed and thus, e.g., dropped in expansion. Simply speaking it means that your documents are valid even if some things aren't mapped to IRIs. They will just be ignored when being interpreted as JSON-LD.

Appendix B

"All keys which are not IRIs, compact IRIs, terms valid in the active "context, or one of the following keywords must be ignored when "processed: This points to some problem with the concept of a relative "IRI again.

Not sure what this has to do with relative IRIs!?

I don't understand B.4. Like Sandro I feel that there's something amiss with data indexing. It looks suspiciously like @rdf:resource.

B.4. is describing Index Maps. The feature is there to allow developers to structure (re-structure) the data in a way that it is easier to work with. This was kind of a compromise because indexing using arbitrary properties (as Sandro suggested) was considered to be too complex (at least for JSON-LD 1.0). You can put whatever you want in the index, it doesn't matter. You could, e.g., put the nodes IRI in the index and then create a map so that you can efficiently access the various nodes.

I really appreciate the effort put into 'flattened view' and think it should be foregrounded in the main body of the document. It's even more important than compaction I think.

Does the section I added do it justice?

B6 - must a list + set contain objects of all the same type? You might want to be explicit about an error if so.

I've improved that part already. The text now says: ... "or an array of zero or more of the above possibilities"

I appreciate all of the examples in Appendix D a lot.

That wraps up what I've to say overall. It was a pleasure to review this document.

Thank you very much for your feedback, Markus

dlongley commented 11 years ago

6.4 "last-defined-wins mechanism." This looks more like a "most recently defined" mechanism, because of nested scopes. I could be misinterpreting "last-defined-wins" though.

I, as a non-native speaker, can't really see a difference. It's not the temporally last (which most recently would suggest to me) but the "closest" one if you look from the current element towards the tree's root.

I suspect that he means that "last-defined" might indicate "last in the document" whereas "most-recently" seems to be a better match to "the 'closest' one if you look from the current element towards the tree's root". I don't think he means temporally (though, really, in this case it's the same), rather, each time you define a term, that definition becomes the "most recent". It also becomes the "last" defined, but what "last" is relative to is perhaps unclear since that word carries a natural meaning both in a nested scope and at the scope of the whole document. A native speaker might be less inclined to mistaken "most recent" for "last in the document" than "last defined" because the word "recent" itself is scope-limiting.

Hopefully that explanation makes sense; as a native-speaker myself, I didn't confuse "last-defined" with something other than its intended meaning, but that may just be because I already understood how term definitions work. In attempting to take a step back, I can see his point and think we ought to change it to "most-recently-defined". I don't think that change would cause more confusion than it might remedy.

dlongley commented 11 years ago

Thanks for updating the spec, Markus!

lanthaler commented 11 years ago

On Thursday, March 14, 2013 4:36 PM, Dave Longley wrote:

6.4 "last-defined-wins mechanism." This looks more like a "most recently defined" mechanism, because of nested scopes. I could be misinterpreting "last-defined-wins" though.

I, as a non-native speaker, can't really see a difference. It's not the temporally last (which most recently would suggest to me) but the "closest"one if you look from the current element towards the tree's root.

I suspect that he means that "last-defined" might indicate "last in the document" whereas "most-recently" seems to be a better match to "the 'closest' one if you look from the current element towards the tree's root". I don't think he means temporally (though, really, in this case it's the same), rather, each time you define a term, that definition becomes the "most recent". It also becomes the "last" defined, but what "last" is relative to is perhaps unclear since that word carries a natural meaning both in a nested scope and at the scope of the whole document. A native speaker might be less inclined to mistaken "most recent" for "last in the document" than "last defined" because the word "recent" itself is scope-limiting. Hopefully that explanation makes sense; as a native-speaker myself, I didn't confuse "last-defined" with something other than its intended meaning, but that may just be because I already understood how term definitions work. In attempting to take a step back, I can see his point and think we ought to change it to "most-recently-defined". I don't think that change would cause more confusion than it might remedy.

OK, thanks a lot for the explanation. I've updated the spec accordingly:

https://github.com/json-ld/json-ld.org/commit/3cf0b117a00bead12600f327ceef289d7f0e4395

Markus Lanthaler @markuslanthaler

lanthaler commented 11 years ago

_Charles' reply:_

Hi Markus,

Most of your responses make perfect sense to me. I'll just take a moment to respond where you needed more information.

On 03/14/2013 03:25 AM, Markus Lanthaler wrote:

Yes, that's tricky to explain properly. I've clarified the grammar section (before I got your review) which now clearly states where terms/abs./rel./comp. IRIs are allowed. I can see your clarifications. I'm trying to understand the rationale for using both base IRI and @vocab. The former appears to enable portability to vocabulary, because terms may have different IRIs depending on their location on the web. Mixing this with the more explicit use of @vocab to resolve terms seems fraught with potential confusion.

I can see that @base is at risk. As is I can see the risk of trying to do to much with expansion.

Until I came to flattening, I thought that JSON-LD was subject to a lot of the same problems as RDF/XML. My concern had to do with manipulating structures as JSON - if there are a lot of ways to represent something, then one gets into a lot of issues with finding data within the structure. Flattening seems to get rid of most of those concerns - it should probably be foregrounded as a good canonical representation if you can go that far. I completely agree. I've added [1] a section similar to expanded/compacted document form: [2]. Thank you that's a good addition.

1.1 I think this characterization of JSON-LD is incorrect: "a serialization of Linked Data in JSON." From what I'm reading, JSON- LD is a method for encoding linked data within JSON documents and generating RDF from them. While it's possible to create JSON-LD documents that are serializations of linked data, the focus of this document presents JSON-LD as a superset of RDF. Many things about JSON-LD rely on document scope, and a JSON-LD can contain much more than just the RDF within. You've probably gone over this point many times before, but JSON-LD seems to be much more about authoring or incrementally creating Linked-Data-ready JSON than it is about writing out Linked Data as JSON. Not sure what to do with this. Do you have something concrete in mind I could use instead?

I think that RDF/XML could have benefited from a distinction between "hey, you can author RDF in XML" and "Here's a way to use XML for RDF interchange." If you can promote the flattened view of RDF in best practices and serialization output, then it will be a lot closer to a high-fidelity RDF serialization.

Much of the spec has to do with adding RDF to JSON without causing pain. This process itself causes pain. If you emphasize that there is a high-fidelity serialization of RDF, plus some other things, then the future users will thank you.

On the other hand, I see I'm crossing the RDF/Linked Data line as well.
So I've not answered your question at all. It might be helpful to just ponder "serialization" as one use case and "authoring" as another one.

  1. Basic concepts A note on 'serialization' above -- dereferencing contexts make JSON-LD really different from other serializations of RDF. Perhaps that's why you've shied away from the term "RDF." Maybe only documents that are fully expanded/dereferenced actually conform to RDF. It means that without the ability to dereference a context, the JSON-LD document has different data in it than it would were the context fully realized. Obviously, if the context changes, the data changes as well. I wouldn't go as far as saying that only expanded JSON-LD conforms to RDF. The situation is similar to RDFa which has some predefined prefixes [3].

Sorry for my ignorance here. Does RDFa change prefixes based on external conditions like the host name? It just seems odd that a document might need dereferencing in order to be completely identified.
It sounds like JSON-LD is setting itself up for the whole mess XML got in with catalogs. This isn't an objection, just something that catches me up.

5.2 I find the introduction of relative IRIs disorienting here. It's taken up later in the document, but not completely; this paragraph has the only mention of "base IRI" in the document, and the reference to 'directory path' seems to just muddy the issue further. In general the interaction between relative IRIs and other terms seems to be a difficult part of this document to understand. As an example, it would seem that using @vocab would rid a document of relative IRIs -- you might want to state that explicitly as a #5 at the end of this section "unmatched terms are relative IRIs" I removed the "directory path" fragment [1] and there's also a new example showing how a relative IRI might be used. The grammar section makes it clear where relative IRIs can be used. Furthermore, there's now a section Base IRI [4] which references RFC3986 and explains the @base keyword and a section Default Vocabulary [5] explaining @vocab.

Yes it all holds together better, especially when I'm reading your latest edits :)

6 Advanced Concepts

On Compact IRIs, it surprises me that this is part of the normative section. I can see why it is, but nonetheless it might be useful to point out why a separate syntax is part of this document, as opposed to an updated version of CURIE. (Please disregard this comment if I'm being silly). Simply speaking, in JSON-LD there are no restrictions at all except that, by definition, the prefix cannot contain a colon (terms can but they will never be selected as prefixes as they won't match anything).

OK

If a prefix:suffix pattern is not matched in the context, is it a relative IRI? (in 6.3 this is prohibited - we have a hole) No, an absolute IRI -- that's also what the current text says btw. :-)

Yes I see this. I think it was another reading-wrong-version mistake.
I doubt anyone would ever intend this to be an absolute IRI, but it's the only solution that makes sense.

6.4 "last-defined-wins mechanism." This looks more like a "most recently defined" mechanism, because of nested scopes. I could be misinterpreting "last-defined-wins" though. I, as a non-native speaker, can't really see a difference. It's not the temporally last (which most recently would suggest to me) but the "closest" one if you look from the current element towards the tree's root.

Yes, this is an ambiguity in English. I don't think your language is unclear at all when one pays attention to it, but I try to avoid the use of "last" because of its ambiguity with regard to 'most recent' or 'temporally last".

6.5 application/ld+json is introduced in a slightly jarring way. Moreover, there's a MUST stipulation attached to its usage, but later in the document its usage is MAY identify a node. I'm just confused by this paragraph. You are referring to this sentence:

"Please note that JSON-LD documents served with the application/ld+json media type MUST have all context information, including references to external contexts, within the body of the document. Contexts linked via a http://www.w3.org/ns/json-ld#context HTTP Link Header MUST be ignored for such documents."

I don't understand what you mean by "later in the document its usage is MAY identify a node". The intention of this paragraph is to say that, if a document is server as application/ld+json the context must be referenced from within the document and not via a HTTP Link header. In other words, if you want to use the link header, you must serve the document as application/json.

Got it.

This is how most multi-lingual JSON is currently expressed. The advantage of doing it this way is that you can access the desired language directly (doc.occupation.en for the English string) instead of having to filter the occupation array. It's always a tradeoff, but we believe that the API is powerful enough to deal with this. If you prefer to have just the occupation as key, just expand the document and re-compact it with a context that doesn't use the container.

Good point. I'm not sure if you talk about expand/re-compact in the API doc, but it's a useful way to consider variation in the JSON authoring process.

6.14 Expanded Document Form and 6.15 compact form. in api doc these are non normative. Perhaps you don't mean that the API doc defines them, just refers to them? The text says:

"The JSON-LD Processing Algorithms and API specification [JSON-LD-API] defines a method for expanding"

The API spec defines (normatively) the algorithms to expand/compact documents. The result of those algorithms are documents in expanded/compacted document form.

Appendix A I don't think a JSON-LD document serializes a collection of graphs. Maybe you can define a subset of JSON-LD that does, however. Well, it serializes a RDF Dataset which is defined as "a collection of RDF graphs" in RDF Concepts. Why do you think it doesn't

For the same reason as above. It seems to me that a JSON-LD document can generate an RDF dataset, and that you can serialize a that same dataset as JSON-LD, but those two documents could be very different.
I'm just pointing out the superset aspect of JSON-LD again.

OK that's enough from me. I like the changes you've incorporated and the document reads really well now.

Charles

lanthaler commented 11 years ago

I think all the issues have been addressed. Unless I hear objections, I will therefore close this issue in 24 hours.