Sandro Hawke's JSON-LD syntax spec review

_@sandhawke's JSON-LD syntax spec review:_

This is my review of json-ld-syntax, as promised in the last meeting.

Summary: The document is in pretty good shape, and I think the underlying design is very good. Below, I suggest a few million editorial changes, a handful of which I think really need to be addressed before publication (and are marked MEDIUM or SERIOUS). I also raise a handful of concerns about the design, but I think they can probably all be dealt with in a few minutes of conversation. I think everything not marked MEDIUM or SERIOUS is fairly trivial.

I reviewed the latest editor's draft: https://dvcs.w3.org/hg/json-ld/raw-file/e582aaa9ee43/spec/latest/json-ld-syntax/index.html

I did not read the json-ld-api. I did play around with the json-ld "playground" site after I was into the appendiced. I haven't reviewed Appendix B yet; I'll try to get to that soon, but it's going to take more brain cells than I have left tonight.

Without further ado...

In an attempt to harmonize the representation of Linked Data in JSON

My first comment turns out to be, I think, the most utterly trivial. Sorry.

[x] My sense is that one "harmonizes" the elements in a set (by modifying them to make them more similar or related in some way); I don't know what it means to harmonize a single item like this.

; mixing both Linked Data and non-Linked Data in a single document.
[x] The clause after a semicolon should be a complete sentence. Change to a comma or rephrase.

the name IRIs, when dereferenced, provide more information about the name
[x] I think they provide information about the named thing. I don't really like this paraphrasing of the LD principles, and I don't think it's helpful to the document here. I'd suggest providing some references instead.

Since JSON-LD is 100% compatible with JSON the large number
[x] comma needed after "JSON"

Additionally to all the features JSON provides,
[x] How about: "In addition to ..."

the ability to express the language associated with a string

? maybe add "natural"

[x] add comma at the end of the item

weights, and distances,

MEDIUM

[x] Really? I pretty much never see people doing that with datatypes.

Software developers that
[x] s/that/who/ on each line

This specification does not describe the programming interfaces for the JSON-LD Syntax. The specification that describes the programming interfaces for JSON-LD documents is the JSON-LD Application Programming Interface [JSON-LD-API].
[x] How about: A companion document, The JSON-LD Application Programming Interface [JSON-LD-API], specifies how to work with JSON-LD at a higher level: it provides a standard library interface for common JSON-LD operations. Although that document is not required for understanding and working with JSON-LD, for some readers it will be a better starting point.

A number of design goals were established before the creation of this markup language:
[x] I don't think the history matters.

How about: JSON-LD satisfies the following design goals:

language. We should focus on simplicity when possible.

I don't think that's what you mean. I think you mean simplicity is paramount.

[x] How about: to the language, so sometimes we do not achieve Zero Edits.

A character is represented as a single character string.

Hard to parse.

[x] How about: A character is represented using a string of length one.

and that leading zeros are not allowed.
[x] ^^^^ omit "that"

Used to specify the native language
[x] s/native/natural (human)/

For the avoidance of doubt, all keys, keywords, and values in JSON-LD are case-sensitive.

Awkward phrase.

[x] s/For the avoidance of doubt, all/All/

Conformance

SERIOUS

It's somewhat odd that all one needs for conformance is appendix B. So what are the other normative parts of this document for...?

[ ] I think there may be a notion of a conformant JSON-LD generator or parser here, too -- one that follows the rules of the rest of this spec. That should be stated here.

different concepts instead of terms such as "name", "homepage", etc.
[x] I think, in this case, the word "terms" should NOT be linked to #dfn-term because you DON'T mean "term" in the JSON-LD sense, here. This is supposed to be the pre-JSON-LD counter-example.

a context is used to map terms, i.e., properties with associated values, to IRIs.

Uh, that doesn't match the definition in #dfn-term. Is a term really a property with its associated value? I don't think so.

[x] How about: s/i.e., properties with associated values/such as the keys in an object structure/

Expanded term definitions may be defined using absolute or compact IRIs as keys, which is mainly used to associate type or language information with an absolute or compact IRI.

This is the first sentence in the document where I have no idea what it means, because it uses concepts not introduced yet. Maybe this can be dropped? Or maybe I'll just have to get it on the second pass.

[x] Later -- Yeah, I'd just drop that sentence, I think.

This information gives the data global context and allows developers to re-use each other's data without having to agree to how their data will interoperate on a site-by-site basis.

I find the re-use of the word "context" awkward here.

[x] How about: This information allows developers to re-use each other's data without having to agree to how their data will interoperate on a site-by-site basis.

External JSON-LD context documents may contain extra information located outside of the @context key,

That makes me wonder if it can be HTML, to be more readable. There would have to be some standard way to find the @context json in the HTML....

[x] Later - I see it can't. Okay, con-neg works, too.

EXAMPLE 5
[x] after this example I was expecting the next example to use a Link header (what turns out to be EXAMPLE 29). Maybe mention it here?

EXAMPLE 6 -- In the example above, the key http://schema.org/name is interpreted as an absolute IRI because it contains a colon (:) and the "http" prefix does not exist in the context.
[x] Now would be a perfect place to have a relative IRI example. You've just talked about there being absolute and relative IRIs, and given an example only of absolute ones.

JSON keys that do not expand to an absolute IRI are ignored, or removed in some cases, by the [JSON-LD-API]. However, JSON keys that do not include a mapping in the context are still considered valid expressions in JSON-LD documents—the keys just don't expand to unambiguous identifiers.

This is kind of weird. It doesn't tell me what I'm supposed to do; it just confuses me.

I guess it means they're like comments, and to be ignored?

[x] This is where we need a clear notion of a processor that reads JSON-LD and extracts all the triples and quads from it, it seems to me.

EXAMPLE 8
[x] It's confusing to have @type here. Maybe stick to just showing @vocab, and not also introducing something we haven't seen yet.
[x] Later -- I see @type is never defined at all. Sigh. I guess it's consider an API thing.

An IRI is generated when a JSON object is used in the value position and contains an @id keyword:
[x] This is the first place you use the word "generated" and it's not at all clear what it means. If we were talking about mapping to RDF it would make sense.

To be able to externally reference nodes in a graph, it is important that each node has an unambiguous identifier. IRIs are a fundamental concept of Linked Data, and nodes should have a de-referenceable identifier used to name and locate them. For nodes to be truly linked, de-referencing the identifier should result in a representation of that node. Associating an IRI with a node tells an application that it can fetch the resource associated with the IRI and get back a description of the node.
[x] I'm not a fan of this paragraph. Can we just delete it?

A node is identified using the @id keyword:
[x] Maybe clarify that @id is overloaded, and it means something different used like this than used as either a key or a value in a context?
[x] It'd be a little more clear if EXAMPLE 11 didn't use @id in all three different ways. How about taking the context out of the example, and just having something like:

{ "@id": "http://manu.sporny.org/#me" "http://schema.org/name": "Manu Sporny", }

(or some other example where an @id is more appropriate)

end of Section 5

[x] As I come to Section 6 being marked normative, I see Section 5 was neither informative nor normative.

A document on the Web that defines one or more IRIs for use as properties in Linked Data is called a vocabulary.

[x] Don't conflate documents with vocabularies, please.

See: https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#vocabularies

[x] I would just drop that whole paragraph. It's motivational, not spec text. And they're wonderfully motivated in the next paragraph anyway.

6.1 Compact IRIs

Prefixes are expanded when the form of the value is a compact IRI represented as a prefix:suffix combination, and the prefix matches a term defined within the active context

Terms are interpreted as compact IRIs if they contain at least one colon and the first colon is not followed by two slashes (//, as in http://example.com)
[x] These sentences contradict each other. Do slashes prevent recognizing things as compact IRIs or not? I'd suggest not -- that's just extra code that wont be helpful, IMHO. (TEST CASE?)

EXAMPLE 17

"foaf": "http://xmlns.com/foaf/0.1/", "foaf:homepage": { "@type": "@id" },

Would that work if the order was reverse? I guess so, since JSON doesn't preserve order. Maybe clarify that, and maybe put them in the other order in the example. (TEST CASE?)

[x] Later -- Oh, I see this is covered well in section 6.9. Maybe near Example 17 say this is covered in more detail in section 6.9....?

6.3 Type Coercion

MEDIUM

Okay, this overloading of @ keywords goes too far with @vocab serving a completely different purpose (from normal @vocab) in this situation. That's just silly.

[ ] Maybe we could at least have a table showing what how the meanings differ in different places in the structure.

EXAMPLE 22
[x] I've read this about 6 times and I can't make sense of it. That is, I think the example makes perfect sense, but the paragraph after it, explaining it, does not. When you say "not a prefix:suffix construct" maybe you mean "not a string"?

Duplicate context terms are overridden using a last-defined-wins mechanism.

SERIOUS

[x] That means you can't use natural JSON parsing, doesn't it? If I read EXAMPLE 24 with a JSON parser into a nested object, then I don't know the order of the @context blocks.

Note that this is rarely a good authoring practice
[x] That doesn't go far enough. You could allow nesting to make Example 24 work, but I don't think it's okay to use order-of-statements.

It is a best practice to put the context definition at the top of the JSON-LD document.

MEDIUM

[x] I don't agree. You're telling me I'm going against best practice to build and object in memory and let my JSON serializer turn it into JSON.

The @context subtree within that object is added to the top-level JSON object of the referencing document.
[x] What if there's more than one @context subtree? Do you mean the merge of all the @context subtrees? [TEST CASE]

end of 6.5
[x] Thinking about this, I'd rather like .well-known/host-context.jsonld as another place I can look. So if I'm trying to get RDF triples, and I just get application/json, and there's no Link Header, I can look for a host-context file. I dunno -- maybe everyone can set a Link header easily enough.

For instance, in the example below the databaseId member would be ignored by a JSON-LD processor.

MEDIUM

[x] This speaks to conformance. "JSON-LD processor" (maybe "consumer") needs to be defined in the Conformance clause, and s/should not/MUST not/ (with maybe some more rewriting).

This method can be accomplished by using the following markup pattern:
[x] "markup"? JSON isn't markup, as I understand the word. Can you just drop the word from the sentence?

(glancing at appendix B for something)

To avoid forward-compatibility issues, a term should not start with an @ character

MEDIUM

[x] Why only SHOULD NOT? Why not MUST NOT? The damage if they do is considerable.
[x] Also, you kind of need to say what processors MUST do if they see a keyword term they don't know -- ie one from the future. The options are: ignore (if you can figure out what/how much to ignore); or halt; or issue a warning to the user.

NOTE: The use of @container in the body of a JSON-LD document has no meaning
[x] That doesn't seem worth saying here. I assume it's ruled out in Appendix B.

6.11 Embedding
[x] Odd section. It seems to have forgotten this was introduced as a graph syntax. The main thing to highlight is that this is syntactic sugar; sometimes it's nice to syntactically embed the node in one of the places that had a link to it.

Example 46

SERIOUS

[x] I suspect the first row of the table is wrong. I would think only the triples inside the value associated with the @graph key would go inside the graph. Please clarify which it is, and correct the table if necessary.

Example 47, 48

MEDIUM

[x] It seems very confusing to use @graph for this. Can't you find a more direct way to do this?
[x] It seemed from stuff earlier (around Example 22) that in Example 48 you wouldn't need to repeat the @context, because it occured earlier. But maybe that example-22 stuff was wrong, and what was really meant there was "closer to the root of the JSON object tree". No, that can't be right, either. I cannot see any sensible rules for which contexts are in effect at any point in the json tree.

How about this as a hack that's more elegant:

[
  { "@context": ...
  },
  {
    "@id": "http://manu.sporny.org/i/public",
    "@type": "foaf:Person",
    "name": "Manu Sporny",
    "knows": "http://greggkellogg.net/foaf#me"
  },
  {
    "@id": "http://greggkellogg.net/foaf#me",
    "@type": "foaf:Person",
    "name": "Gregg Kellogg",
    "knows": "http://manu.sporny.org/i/public"
  }
]

[x] ... with a rule that an object that has JUST a @context key, and no other keys, is actually omitted from arrays. That seems like a cleaner hack than using the @graph keyword. Keep @graph for when people really want named graphs.

6.13 Identifying Blank Nodes
[x] This is okay, but it would be pretty easy and much more in keeping with the style of the document to avoid mentioning RDF, even here.

Something like:

For some topologies of the graph of nodes being expressed in JSON-LD, such as topologies with loops, embedding along cannot be used, and @id must be used to connect the nodes. In some cases, one may not want to name nodes with IRIs. In these situations, one can use "blank node identifiers", which look like IRIs but with _ (underscore) as the scheme name. For example:

{
    @id: _:n1,
    name: Secret Agent 1
    knows:
      {
        name: Secret Agent 2
        knows: { @id: _:n1 }
      }
}

In this case, we do not want to assign IRIs to the two people, but we want to express that they know each other. We can say SA1 knows SA2 using embedding, but to say SA2 knows SA1 we need to use a blank node identifier.

Every statement in the context having a keyword as the key (as in { "@type": ... }) will be ignored when being processed.

[x] I think you mean this only for keywords that are known to be meaningless when used as keys in a @context. I think it would be better to make this an error. But the bigger question is about forward compatibility -- MUST processors ignore all keyword keys in contexts? (Are any allowed, with meaning? I don't see any.)

6.15 and 6.16

[x] These should probably be marked non-normative. There's nothing here I need to know to work with JSON-LD (although it's very cool and all).

6.17 Data Indexing

Not sure how I feel about this. It's kind of weird, but pretty harmless, I guess.

I'm not sure it would work, but an alternative design would be to have a particular property be @index'd. So instead of: "@container": "@index" in the context we'd say "@index": "lang" and then the stuff in green would be equivalent to:

"post": [
  {
    "lang": "en",
    "@id": "http://example.com/posts/1/en",
    "body": "World commodities were up today with heavy trading of crude 
    oil...",
    "words": 1539
  },
  {
    lang: "de",
    "@id": "http://example.com/posts/1/de",
    "body": "Die Werte an Warenbörsen stiegen im Sog eines starken Handels 
    von Rohöl...",
    "words": 1204
  }
]

[x] I think that would provide the same functionality, but without these keys that aren't really in the data. It would let you cleverly generate JSON-LD like this from plain triples, if given the right context. (You'd have to have triples with the same S and P, where each O differs in the value of a DataProperty, as in this example.)

A. Data Model

What happens if the same @graph @id is used in two places? are the graphs merged, or what? Shouldnt the spec say? Or is that left to the API document as well? (it's a lot more than an API.) (in TriG they are merged)

[x] In general, I found Appendix A very confusing, and I'm thoroughly familiar with the RDF data model. This does not bode well for JSON folks. Do they need to understand this section, or can it be marked non-normative?

Whenever possible, an edge should be labeled with an IRI.
[x] As far as I can tell, from reading the spec up to this point, if it doesn't have an IRI, it's ignored -- and thus not part of the data model. Several times you say terms that dont map to IRIs are ignored.

This section is normative; This section is non-normative

SERIOUS

[x] These labels seem to be applied inconsistently.

The JSON-LD Algorithms and API specification [JSON-LD-API] defines the conversion rules between JSON's native data types and RDF's counterparts to allow full round-tripping.

SERIOUS EDITORIAL

[x] I really don't like the mapping-to-RDF being left to another, later spec. I can live with it just being shown in the examples, except for not knowing what happens with numbers. From the playground I see integers end up as xsd:integer and otherwise they are xsd:double, which is simple enough, but should really be said in this document, or at least shown in an example.
[ ](I see a bug in the playground. If you use too large an integer, it converts the lexrep to being in scientific notation.)

In JSON-LD lists are part of the data model whereas in RDF they are part of a vocabulary, namely [RDF-SCHEMA].
[x] Doesn't JSON-LD also have sets? As I read the spec, it seemed like @collection: @set had some semantics, in addition to being a directive to keep singletons in arrays. A set-valued property is somewhat different from a repeated property.

The JSON-LD context has direct equivalents for the Turtle @prefix declaration:
[ ] True, but that doesn't seem to be what the examples are showing. I'd just drop that line.

Appendix B

Not really reviewed at this time.

E. IANA Considerations This section is non-normative.

SERIOUS

[x] Actually, I think this section is Normative, like the profile stuff.

will be submitted to the Internet Engineering Steering Group if this specification becomes a W3C Recommendation.

MEDIUM

[x] Actually it goes at Last Call, as per http://www.w3.org/2002/06/registering-mediatype

To request or specify Expanded JSON-LD document form, the IRI http://www.w3.org/ns/json-ld#expanded SHOULD be used.

SERIOUS

[ ] I can't figure out who the SHOULD applies to. Do you mean:

if you want the expanded form, you SHOULD ask for it with this profile

(which I think would be silly) or do you mean:

if you receive a request that includes this profile parameter, you SHOULD return expanded form

? I guess the latter, but that's not what it says. I would think you'd use normal media-type rules here -- if you can't provide it in expanded form, then you're not providing it, and fallback to another media type.

Published specification: The JSON-LD specification.

[x] This should be plain text, and the URL should be updated. I guess it will be http://www.w3.org/TR/json-ld-syntax

Fragment identifiers used with application/ld+json resources may identify a node in a JSON-LD graph expressed in the resource. This idiom, which is also used in RDF [RDF-CONCEPTS], gives a simple way to "mint" new, document-local IRIs to label nodes and therefore contributes considerably to the expressive power of JSON-LD.

MEDIUM

I have no idea what this text is trying to say. For my best guess, please replace it with:

[x] Fragment identifiers used with application/ld+json are treated as in other RDF syntaxes, as per RDF Concepts (link to http://www.w3.org/TR/rdf11-concepts/#section-fragID) [RDF-CONCEPTS]

References
[ ] Some of them are out of date, like TURTLE-TR. Also, the reference style isn't correct -- it only has the dated links.

That's it. I'll try to get to Appendix B. before the meeting, but I wanted to send this early enough that it can be read & digested before Wednesday's meeting.

Keep up the great work, guys. I only point out all these places for improvement because I think this is so important and want it to have the best chance it can.

-- Sandro

_[Feedback from Sandro Hawke, part 2]():_

I think the document could be greatly strengthened (and most of my non-trivial comments in part 1 addressed) by the following changes:

In Conformance add something like:

A conforming JSON-LD Expander takes as input a conforming JSON-LD document D1 and outputs a conforming JSON-LD document D2, using the expansion mapping defined in Appendix XX. D2 will contain no @context declarations and, informally, will convey the same underlying information.

A conforming JSON-LD Compactor takes as input a JSON-LD @context declaration and conforming JSON-LD document D2 and outputs a conforming JSON-LD document D1, such that a conforming JSON-LD Expander would convert D1 to D2 (or an equivalent document which would JSON-parse to the same internal structure).

A conforming JSON-LD To-RDF Converter takes as input a conforming JSON-LD document J and outputs an RDF Dataset R using the conversion mapping defined in Appendix C.

A conforming JSON-LD From-RDF Converter takes as input an RDF Dataset R and output a JSON-LD document J such that a conforming JSON-LD To-RDF Converter would convert J to D (or an equivalent document which would JSON-parse to the same internal structure).

Note there is no need to define the Compaction and From-RDF mappings in detail; it's enough to say (as above) that they are the inverses of already-defined mappings. I believe that sufficiently constrains them. For implementation advice, they can see another document, which need only be a Note.

Add appendix XX which defines the expansion mapping. I have not actually looked at how that's currently defined.
Move json-ld-api sections 5.18-5.21 and 5.23 to json-ld-syntax appendix C.

Note that we should probably change the shortname from /TR/json-ld-syntax to /TR/json-ld for the next publication. It's a bit of a pain, but worthwhile in the long run, I think.

These changes would make json-ld-syntax stand parallel to Turtle, as a completely defined RDF serialization syntax (not needing the API document), but they wouldn't significantly reduce the "RDF tax" on JSON-LD. Just a few sentences in Conformance and a longer RDF Appendex. That seems to me like a good thing (and also what I understood the RDF WG to be asking for).

-- Sandro

PROPOSAL 1: Remove section "Design goals"

PROPOSAL 2: Move section "Interpreting JSON as JSON-LD" into basic concepts or combine it with "The Context"

PROPOSAL 3: Remove the "://" safeguard.

PROPOSAL 4: Terms beginning with @ MUST NOT be used (whether we enforce it in the algorithms is a different question)

PROPOSAL 1: -1. I think it's useful for the reader to know what the rational for the JSON-LD syntax is. Again, Sandro didn't suggest that the section be removed, but slightly reworded.

PROPOSAL 2: +1. This could be combined elsewhere.

PROPOSAL 3: -1. I think it serves the same purpose for JSON-LD as it does when it was introduced for RDFa; it catches a lot of common problems with the failure to define prefixes. Before RDFa had this, a lot of garbage triples could be generated.

PROPOSAL 4: -0.5. While I think that normal use of '@' should not be tollerated, we should create the ability for the specification to be extended with design notes that use other keywords (e.g., @ordered). If this is a MUST NOT, then it is impossible for someone to extend the specification and not violate a normative constraint.

@darobin: Was there a recent change in ReSpec which removed the "This section is normative" statements? At least they don't show up anymore in our specs, see e.g. http://json-ld.org/spec/latest/json-ld-api/#context-processing-algorithms

PROPOSAL 1: +0.5 don't really care but the section also doesn't add much value

PROPOSAL 2: +1

PROPOSAL 3: +0.5 no strong preference though.. it just adds extra code; I don't think we can compare JSON-LD and RDFa

PROPOSAL 4: -0.5 for the same reasons as Gregg noted. Would be fine with an SHOULD NOT though. Other specifications can always update the existing one. I'm against enforcing it in the algorithms.

On proposal 4:

Extensibility, etc, is great, but I don't see how just allowing '@' would allow that

I think there are two separate questions here:

Question 1. When is it okay for someone to publish-to-the-world json-ld data using an @-keywork that's not in the current spec?

To answer this question, we need to think about why they would want to. I'd think the only reason they would want to would be because they are trying to extend JSON-LD with new features. That's a nice idea, but if several people do it without coordinating their work, we'll end up with mass confusion.

So, I think they need to talk to each other first, and reach consensus on what a term means before anyone starts to use it in the wild. In other words, no one should use such keywords in public until/unless the folks who own the JSON-LD spec (some group at W3C) say it's okay.

That might be done by having a simple first-come first-served registry, or it might mean no one gets to use a new term in public until JSON-LD 1.1 reaches Candidate Recommendation. We don't need to decide that now, but it would be a mess of people just started using their own extensions in public without any coordination.

So I'd say a document using such a keyword is NOT a JSON-LD 1.0 document. It might be a JSON-LD 1.1 document, someday -- in fact, that's how one will recognize a JSON-LD 1.1 document. So that means conforming JSON-LD 1.0 producers MUST NOT produce such documents.

Question 2. What should JSON-LD 1.0 consuming software do if it sees a document with an unknown keyword? This is going to happen (a) when someone makes a mistake, (b) when someone is trying to extend JSON-LD without consensus, or (c) a proper extension is being used or JSON-LD 1.1 is out, but this consumer software hasn't been updated to support them. For (a) or (b) it would be fine to halt and give an error, I think, to say JSON 1.0 consumers MUST NOT consume such documents. For (c), this could be a big problem, though. For this we need to imagine ourselves in the future, wanting to add some new features. (You probably have some in mind, things that were left out of 1.0.) What will you want existing software to do when it sees a JSON-LD 1.1 document, or a document with an extension it doesn't implement?

If such software rejects it -- like an iphone getting flash content -- then users suffer, and people are strongly motivated not to use extensions or deploy version 1.1 So that's no good.

If such software ignores it -- like a web browser getting HTML elements or attributes it doesn't implement, then everything will look fine to the user ----- unless the extension changes the meaning of the data in some important way. Then the user wont even get a warning, they'll just get bad data, with potentially disastrous results. Ignoring it only works, I think, if the extensions are some kind of pragmas or hints that don't affect the basic data.

Maybe the problem can be sidestepped by using a new media type for JSON-LD 1.1. Then the two can exist side by side. Then JSON-LD 1.0 consumers can be told they MUST NOT consume 1.1 documents, because they'll only see them if there is a mime error. (For consuming JSON data as JSON-LD, I guess this still works, since the mime type of the linked-to context would be JSON-LD 1.1's mime type.)

Another solution I see sometimes (SOAP and RIF do this) is that each extension is somehow flagged with MUST-UNDERSTAND or MAY-IGNORE. JSON-LD could do this with stuff in the @context, or with a syntactic hack, may-ignore keywords being lowercase or starting with '@?' or something.

I think I'd propose the group pick EITHER MUST-UNDERSTAND or MAY-IGNORE for everything, and then if/when someone wants an extension with the other kind of semantics, they have to use a new mime type. As for which to go with for now, I'd say it depends what kinds of extensions you think people are going to want first. Are they things that can be safely ignored?

Some more updates based on Sandro's feedback and the discussions in today's JSON-LD telecon.

JSON keys that do not expand to an absolute IRI are ignored, or removed in some cases, by the [JSON-LD-API]. However, JSON keys that do not include a mapping in the context are still considered valid expressions in JSON-LD documents-the keys just don't expand to unambiguous identifiers.

This is kind of weird. It doesn't tell me what I'm supposed to do; it just confuses me.

I guess it means they're like comments, and to be ignored?

This was changed to [1]

"JSON keys that do not expand to an IRI, such as status in the example above, are not Linked Data and thus ignored when processed."

6.1 Compact IRIs

Prefixes are expanded when the form of the value is a compact IRI represented as a prefix:suffix combination, and the prefix matches a term defined within the active context

Terms are interpreted as compact IRIs if they contain at least one colon and the first colon is not followed by two slashes (//, as in http://example.com)

These sentences contradict each other. Do slashes prevent recognizing things as compact IRIs or not? I'd suggest not -- that's just extra code that wont be helpful, IMHO. (TEST CASE?)

I've clarified the section and added two test cases in [2].

Values of the form prefix://suffix are not considered as compact IRIs to prevent developers from accidentally overwriting all their http URLs for example. RDFa had to introduce a similar safety-mechanism.

It is a best practice to put the context definition at the top of the JSON-LD document.

MEDIUM

I don't agree. You're telling me I'm going against best practice to build and object in memory and let my JSON serializer turn it into JSON.

I changed it to the following suggestion [3]:

"When possible, the context definition should be put at the top of a JSON-LD document. This makes the document easier to read and might make streaming parsers more efficient. Documents that do not have the context at the top are still conformant JSON-LD."

To avoid forward-compatibility issues, a term should not start with an @ character

MEDIUM

Why only SHOULD NOT? Why not MUST NOT? The damage if they do is considerable.

Also, you kind of need to say what processors MUST do if they see a keyword term they don't know -- ie one from the future. The options are: ignore (if you can figure out what/how much to ignore); or halt; or issue a warning to the user.

I've clarified that note [4]. Just as any other term that isn't mapped to an IRI, terms starting with an "@" that are not keywords are being ignored.

EXAMPLE 5

after this example I was expecting the next example to use a Link header (what turns out to be EXAMPLE 29). Maybe mention it here?

The majority of the group felt that this is too early in the document. I've added the following statement instead [5]:

"JSON documents can be transformed to JSON-LD without having to be modified by referencing a context via an HTTP Link Header as described in section 6.8 Interpreting JSON as JSON-LD. It is also possible to apply a custom context using the JSON-LD API [JSON-LD-API]."

EXAMPLE 6 -- In the example above, the key http://schema.org/name is interpreted as an absolute IRI because it contains a colon (:) and the "http" prefix does not exist in the context.

Now would be a perfect place to have a relative IRI example. You've just talked about there being absolute and relative IRIs, and given an example only of absolute ones.

I've added an example in [6].

[1] https://github.com/json-ld/json-ld.org/commit/aa43ac1f788bc2b69460319696edab6c6cb217cf [2] https://github.com/json-ld/json-ld.org/commit/1d20718c328e932a90b0445eae7f3a61df8a4840 [3] https://github.com/json-ld/json-ld.org/commit/18a5cad721c778b3c222f141e694bd7a33472560 [4] https://github.com/json-ld/json-ld.org/commit/ab62ca52d0e5d2a3fe307495f2a82b72cdb921ee [5] https://github.com/json-ld/json-ld.org/commit/617b7d97ddb81a1a509fb6abd7b868ad4f03fd9d [6] https://github.com/json-ld/json-ld.org/commit/b565c54464313f5ec09ed4fb58efd0258a4c0eb2

Markus Lanthaler @markuslanthaler

To be able to externally reference nodes in a graph, it is important that each node has an unambiguous identifier. IRIs are a fundamental concept of Linked Data, and nodes should have a de-referenceable identifier used to name and locate them. For nodes to be truly linked, de-referencing the identifier should result in a representation of that node. Associating an IRI with a node tells an application that it can fetch the resource associated with the IRI and get back a description of the node.

I'm not a fan of this paragraph. Can we just delete it?

I've reworded that paragraph in 5b797083a3af273ddb68126df80ffe7cc8e0f962. Does this address your concerns?

Markus Lanthaler @markuslanthaler

Sandro, we discussed proposal 4 today and concluded that we shouldn’t forbid terms starting with an @. Just as any other term that isn’t mapped to an IRI, such terms will be ignored by conformant processors. If someone does map a @-term to an IRI, that’s fine in JSON-LD 1.0 but might break in a later version. An example would be a @definedby tag which is mapped to rdfs:isDefinedBy. This is similar to how unknown tags and attributes are treated in HTML.

SERIOUS EDITORIAL

I really don't like the mapping-to-RDF being left to another, later spec. I can live with it just being shown in the examples, except for not knowing what happens with numbers. From the playground I see integers end up as xsd:integer and otherwise they are xsd:double, which is simple enough, but should really be said in this document, or at least shown in an example.

Sandro, I've added ac8214ec02be4c4a6a6283807e00ab82c0035643 an example to the syntax spec:

http://json-ld.org/spec/latest/json-ld-syntax/#conversion-of-native-data-types

Does this address your concerns?

Thanks, Markus

Markus Lanthaler @markuslanthaler

_Gregg's reply to Sandro:_

Sandro, we discussed moving the algorithms relevant to RDF conversion out of the API doc and into the base JSON-LD doc, but we don't find that practical. Many of the algorithms outlined in JSON-LD-API are essential for transforming JSON-LD into RDF (context processing, value expansion, IRI expansion, Expansion and Flattening). Instead, I've created an informative description of the process of turning JSON-LD into RDF (and vice-versa) in section C.1. This is necessarily brief, and does not detail the treatment of RDF Collections or Named Graphs.

Please let us know if this resolves this particular issue.

I'm not particularly happy with this outcome, but I think I understand it.

I don't know if the rest of the RDF WG will.

I guess I need to study the second doc better and see why it needs to be separate.

@sandhawke, I think all the issues you raised in your review of the JSON-LD syntax specification have been addressed. Unless you (or someone else) disagrees, I will close this issue in 24 hours.

Sorry, I haven't completed my second review yet.

This issue is just about your feedback regarding the syntax spec. Do you mean the review of my changes or the review of the API spec?

It's all the same to me: when I swap in JSON-LD, I'll do it all at once.

OK, I’ll leave the issue open for the time being.

_Follow-up review by @sandhawke:_

This is a partial follow-up review of json-ld. Here I'm reviewing:

JSON-LD 1.0
A JSON-based Serialization for Linked Data
[prepared as] W3C Working Draft 04 April 2013
https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html

Summary: more of the same - mostly editorial - a few issues that will hopefully be simple to review. I'm not quite done, but may have to stop for a day or two, so I'm sending this along now.

Details:

Simply speaking, a context is used to map terms, to IRIs.

[x] s/terms,/terms/

and types that do not match a term or are neither a compact IRI nor
[x] s/or are neither/and are neither/

If multiple embedded JSON-LD documents are extracted as RDF, the result is the RDF merge of the extracted datasets.

Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of

<g> { <a> <b> 1 }

and

 <g> { <a> <b> 2 }

<g> { <a> <b> 1,2 }

and sometimes it's obvious the two can't be merged because they contradict each other.

See: http://www.w3.org/2011/rdf-wg/track/issues/17

RESOLVED: close issue-17 -- there is no general purpose way to merge datasets; it can only be done with external knowledge.

[x] Proposed solution is to define it here, something like: If multiple embedded JSON-LD documents are extracted as RDF, the result is a dataset formed by merging all the graphs that have the same name (and thus making a single named graph per graph name) and all the default graphs (to make one resulting default graph).

Figure 1: An illustration of JSON-LD's data model.

[x] Broken image link.
[x] More importantly, the diagram is both misleading and wrong. It's misleading in that each of the nodes is shown as being in exactly one graph; nodes are actually allowed to be in multiple graphs, and nearly always are. It's wrong in that it shows two arcs that aren't in any graph, when actually every arc has to be in one or more graphs.

I haven't managed to produce a good drawing of this. Sometimes I think of it as color-coding arcs, like this:

http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/figures/AnimMerge8.png

and somtimes I think of it as layers:

http://www.flickr.com/photos/danbri/3472944745/ http://farm4.static.flickr.com/3613/3384528143_8304792836_b.jpg

although I image the layers closer together, like transparent sheets of plastic, each with writing on them.

Whenever possible, the graph name /SHOULD/ be an IRI

[x] s/possible/practical/ (I think)

At Risk
[x] I'm a little lost in the AT RISK features. Can we do it like this: http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/#atRisk1 ? So each at-risk feature is identified separately from where it occurs in the specs, on a wiki page (rdf-wg/wiki/JSON-LD_Features_at_Risk or something). And each time it comes up in the specs, that is referenced, along with a clear explanation for people who've never heard of this little feature of the W3C process.

Within the JSON-LD syntax these edge labels are called properties.
[x] Actually, you use the term somewhat inconsistently -- sometimes you call those labels "property names" and sometimes you call them "property labels". I'm not sure this is worth fixing -- I'm probably being overly pedantic to mention it -- but in RDF they'd be considered property names. The property itself is the thing denoted by the IRI. I think in general it's fine to call these things "properties" (and skip over the detail that they are property names), but maybe in the formal model it's better to be precise.

Issue 217

In contrast to the RDF data model as defined in [RDF11-CONCEPTS], JSON-LD allows blank nodes as property labels and graph names. Thus, some data that is valid JSON-LD cannot be converted to RDF. This feature may be removed in the future.
[x] This notion appears a few other times. As I mention in my review of json-ld-api, I think we should say it can be converted, it just requires Skolemizing.
[x] Also, the At Risk phrasing should be more clear about what the change might be. Something like: "Based on implementor feedback, the Working Group may decide to prohibit the use of blank nodes as property labels and graph names."

A JSON-LD document /MUST/ be a single node object or a JSON array containing a set of one or more node objects at the top level.

[x] How about: ... or a JSON array whose elements are each node objects.

B.1 Terms

A term is a short-hand string that expands to an IRI
or a blank node identifier.

A term /MUST NOT/ equal any of the JSON-LD keywords.

To avoid forward-compatibility issues, a term
/SHOULD NOT/ start with an @ character as future versions of
JSON-LD may introduce additional keywords.
Furthermore, the term /MUST NOT/ be an empty string ("")
as not all programming languages are able to handle empty
property names.

[x] This whole section concerns me. Can a term contain a colon? Can it be a plain colon? Can it be an apostrophe? Can it be a string of 2^32 ASCII NUL characters? I rather doubt every implementation will allow all of these, but some might, so there could be interoperability problems. And there should be tests in the test suite of all the weird ones (but maybe there already are).
```
A JSON object is a node object
if it exists outside of a JSON-LD context and:
  * it does not contain the @value, @list, or @set keywords, and
  * it is not the top-most JSON object
    in the JSON-LD document consisting of no other members than
    @graph and @context.
```
[x] Ah, I've seen this text before. :-) Maybe you've replied on that already. Short version: it'd help to give a name to those things mentioned in that last bullet point, at least. Maybe call them "binder objects" or "envelope objects" or something like that. Actually, I think they should have their own section in the Advanced Topics. (And I've already said I don't think they should use the @graph keyword, but I gather you decided against me on that. I'll go check old emails later, I hope.)
```
the keys of the different node objects
are merged to create the properties of the resulting node.
```

[x] maybe s/are merged/need to be merged/ ?

Keys in a node object that are not keywords
/MAY/ expand to an absolute IRI using the active context.

[x] That use of "MAY" technically means that implementations have the option of expanding them or not, right? Maybe something more like: "Each key can be classified as one of: (1) a keyword, (2) a keyword alias, (3) an absolute IRI, (4) a relative IRI, convertable to an absolute IRI using the active base, (5) a term which expands to an absolute IRI according to the active context, or (6) a term which does not expand to an absolute IRI, (7) a string which does not conform to the term syntax.
Keys of type (6) and (7) are ignored."
[x] Actually, writing that makes clear my concern about terms above. How can you tell a term from a relative IRI? Isn't "foo" both? I'd suggest that in json-ld relative IRI's be required to contain a "/" character and terms be limited to c-identifier syntax.
[x] Also, class (6) keys might well be due to a typo -- is it okay to issue warnings on class (6) and class (7) keys, instead of just ignoring them?
```
 The value associated with the `@type` key /MUST/ be a term a compact IRI
 an absolute IRI, a relative IRI, or null.
```
[x] What does it mean for a @type to be null? I don't see anything in the spec about this case.
```
This section is non-normative.
```

It seems like there are too many of these.... I think. How can most of the document be non-normative? For example, how am I supposed to know what to do with @index? If I'm writing a generic JSON-LD display tool, do I have to convert it to RDF first? If not, I'm going to have to know what I'm supposed to do with @index.

  Summarized these differences mean that JSON-LD is capable of
  serializing any RDF graph or dataset and most, but not all, JSON-LD
  documents can be transformed to RDF.

Yeah, I guess every RDF graph can be converted to JSON-LD with explicit use of the rdf:first and rdf:rest properties. Ugly, but technically correct.

And (again), I'd suggest that every JSON-LD document can be transformed to RDF, but with a few losses in the process -- you may need to Skolemize, you lose @index information, and any other "ignored" bits.

-- Sandro

Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of

<g> { <a> <b> 1 }

and

<g> { <a> <b> 2 }

is

<g> { <a> <b> 1,2 }

and sometimes it's obvious the two can't be merged because they contradict each other.

See: http://www.w3.org/2011/rdf-wg/track/issues/17

We had a discussion on IRC about the problems of merging default graphs. For example, if a developer re-states the facts in both RDFa and JSON-LD in the same document (worse, microdata, which almost encourages the use of BNodes), the result will be a merger with duplicate BNodes, that typically are intended to be exactly the same node. One way would be to provide an algorithm for creating a named graph to contain the default graphs of all included script, microdata or RDF/XML which is also extracted (another case where BNode graph IDs would have been useful). Alternatively, a Note on the subject could just warn against this pattern.

@gkellogg duplicate blank nodes don't seem like much of a problem; just make the graph lean.... Or maybe that's too hard?

@lanthaler what kind of different applications are you imagining?

@sandhawke, is there a definition of a "lean graph" and an algorithm to make a graph lean? It seems similar to graph isomorphism.

Anyway, we do need a separate Note to describe this stuff, as it doesn't bear repeating in both Turtle and JSON-LD, and they shouldn't have to concern themselves with generic issues.

On Friday, March 29, 2013 4:25 PM, Sandro Hawke wrote:

This is a partial follow-up review of json-ld.

Summary: more of the same - mostly editorial - a few issues that will hopefully be simple to review. I'm not quite done, but may have to stop for a day or two, so I'm sending this along now.

Thanks Sandro. I've tried to address most of them in cbcd28960b2014dc45e4e98fb192278c99cd47ff.

Details:

Simply speaking, a context is used to map terms, to IRIs.

s/terms,/terms/

Fixed

and types that do not match a term or are neither a compact IRI nor

s/or are neither/and are neither/

Fixed

If multiple embedded JSON-LD documents are extracted as RDF, the result is the RDF merge of the extracted datasets.

Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of
<g> { <a> <b> 1 }
and
 <g> { <a> <b> 2 }
is
<g> { <a> <b> 1,2 }
and sometimes it's obvious the two can't be merged because they contradict each other.

See: http://www.w3.org/2011/rdf-wg/track/issues/17

RESOLVED: close issue-17 -- there is no general purpose way to merge datasets; it can only be done with external knowledge.

Proposed solution is to define it here, something like: If multiple embedded JSON-LD documents are extracted as RDF, the result is a dataset formed by merging all the graphs that have the same name (and thus making a single named graph per graph name) and all the default graphs (to make one resulting default graph).

I decided to just remove that sentence. I think it confuses more than it helps.

Figure 1: An illustration of JSON-LD's data model.

Broken image link.

Fixed

More importantly, the diagram is both misleading and wrong. It's misleading in that each of the nodes is shown as being in exactly one graph; nodes are actually allowed to be in multiple graphs, and nearly always are. It's wrong in that it shows two arcs that aren't in any graph, when actually every arc has to be in one or more graphs.

Good spot. I removed the cross-graph arcs.

I haven't managed to produce a good drawing of this. Sometimes I think of it as color-coding arcs, like this:

http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/figures/AnimMerge8.png

and somtimes I think of it as layers:

http://www.flickr.com/photos/danbri/3472944745/ http://farm4.static.flickr.com/3613/3384528143_8304792836_b.jpg

although I image the layers closer together, like transparent sheets of plastic, each with writing on them.

I didn't introduce layers to show that nodes might be in multiple nodes. I think that would go beyond the scope of this simple, informative illustration.

Whenever possible, the graph name /SHOULD/ be an IRI

s/possible/practical/ (I think)

Fixed

At Risk

I'm a little lost in the AT RISK features. Can we do it like this: http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/#atRisk1 ? So each at-risk feature is identified separately from where it occurs in the specs, on a wiki page (rdf-wg/wiki/JSON-LD_Features_at_Risk or something). And each time it comes up in the specs, that is referenced, along with a clear explanation for people who've never heard of this little feature of the W3C process.

Good idea. I will update the spec to this style tomorrow.

Within the JSON-LD syntax these edge labels are called properties.

Actually, you use the term somewhat inconsistently -- sometimes you call those labels "property names" and sometimes you call them "property labels". I'm not sure this is worth fixing -- I'm probably being overly pedantic to mention it -- but in RDF they'd be considered property names. The property itself is the thing denoted by the IRI. I think in general it's fine to call these things "properties" (and skip over the detail that they are property names), but maybe in the formal model it's better to be precise.

The only two occurrences where we used property names was when we talked about "empty JSON keys". I fixed this as well.

Issue 217

In contrast to the RDF data model as defined in [RDF11-CONCEPTS], JSON-LD allows blank nodes as property labels and graph names. Thus, some data that is valid JSON-LD cannot be converted to RDF. This feature may be removed in the future.

This notion appears a few other times. As I mention in my review of json-ld-api, I think we should say it can be converted, it just requires Skolemizing.

Added that info already when I updated json-ld-api.

Also, the At Risk phrasing should be more clear about what the change might be. Something like: "Based on implementor feedback, the Working Group may decide to prohibit the use of blank nodes as property labels and graph names."

Will do tomorrow.

A JSON-LD document /MUST/ be a single node object or a JSON array containing a set of one or more node objects at the top level.

How about: ... or a JSON array whose elements are each node objects.

Fixed

B.1 Terms A term is a short-hand string that expands to an IRI or a blank node identifier. A term /MUST NOT/ equal any of the JSON-LD keywords. To avoid forward-compatibility issues, a term /SHOULD NOT/ start with an @ character as future versions of JSON-LD may introduce additional keywords. Furthermore, the term /MUST NOT/ be an empty string ("") as not all programming languages are able to handle empty property names.

This whole section concerns me. Can a term contain a colon? Can it be a plain colon? Can it be an apostrophe? Can it be a string of 2^32 ASCII NUL characters? I rather doubt every implementation will allow all of these, but some might, so there could be interoperability problems. And there should be tests in the test suite of all the weird ones (but maybe there already are).

A term can be any valid JSON string except the empty string. So yes, it can contain a colon, it can also be a plain colon. Any control character needs to be escaped.

A JSON object is a node object if it exists outside of a JSON-LD context and:

it does not contain the @value, @list, or @set keywords, and

it is not the top-most JSON object in the JSON-LD document consisting of no other members than @graph and @context.

Ah, I've seen this text before. :-) Maybe you've replied on that already. Short version: it'd help to give a name to those things mentioned in that last bullet point, at least. Maybe call them "binder objects" or "envelope objects" or something like that. Actually, I think they should have their own section in the Advanced Topics. (And I've already said I don't think they should use the @graph keyword, but I gather you decided against me on that. I'll go check old emails later, I hope.)

Yes, replied to this already. Lets discuss it in the thread.

the keys of the different node objects are merged to create the properties of the resulting node.

maybe s/are merged/need to be merged/ ?

Fixed

Keys in a node object that are not keywords /MAY/ expand to an absolute IRI using the active context.

That use of "MAY" technically means that implementations have the option of expanding them or not, right? Maybe something more like: "Each key can be classified as one of: (1) a keyword, (2) a keyword alias, (3) an absolute IRI, (4) a relative IRI, convertable to an absolute IRI using the active base, (5) a term which expands to an absolute IRI according to the active context, or (6) a term which does not expand to an absolute IRI, (7) a string which does not conform to the term syntax.
Keys of type (6) and (7) are ignored."

Does it? This spec isn't talking about implementations, it's talking about JSON-LD the format. I think in that context it is OK to say that keys MAY expand to an absolute IRI. Please note that a key cannot be a relative IRI.

Actually, writing that makes clear my concern about terms above. How can you tell a term from a relative IRI? Isn't "foo" both? I'd suggest that in json-ld relative IRI's be required to contain a "/" character and terms be limited to c-identifier syntax.

Keys are never relative IRIs. They are either terms, absolute or compact IRIs (@vocab may be used to set an "implicit" prefix for all keys that are neither terms, absolute or compact IRIs).

Also, class (6) keys might well be due to a typo -- is it okay to issue warnings on class (6) and class (7) keys, instead of just ignoring them?

Of course, every implementation is free to issue warnings. However, a JSON-LD won't raise an error and stop processing. It will ignore them and continue processing.

The value associated with the @type key /MUST/ be a term a compact IRI an absolute IRI, a relative IRI, or null.

What does it mean for a @type to be null? I don't see anything in the spec about this case.

Just as every other key that is set to null - it is ignored. It's the same as if it wouldn't have been there.

This section is non-normative.

It seems like there are too many of these.... I think. How can most of the document be non-normative? For example, how am I supposed to know what to do with @index? If I'm writing a generic JSON-LD display tool, do I have to convert it to RDF first? If not, I'm going to have to know what I'm supposed to do with @index.

Depends on what your tool is supposed to do. I personally wouldn't mind making both Basic Concepts and Advanced Concepts normative.

Summarized these differences mean that JSON-LD is capable of serializing any RDF graph or dataset and most, but not all, JSON-LD documents can be transformed to RDF.

Yeah, I guess every RDF graph can be converted to JSON-LD with explicit use of the rdf:first and rdf:rest properties. Ugly, but technically correct.

Right.

And (again), I'd suggest that every JSON-LD document can be transformed to RDF, but with a few losses in the process -- you may need to Skolemize, you lose @index information, and any other "ignored" bits.

Could you please provide some concrete text (given that you weren't completely satisfied with my change in json-ld-api). Thanks

Cheers, Markus

All issues have been addressed. Unless I hear objections, I will close this issue in 24 hours.

json-ld / json-ld.org

Sandro Hawke's JSON-LD syntax spec review #224