json-ld / json-ld.org

JSON for Linked Data's documentation and playground site
https://json-ld.org/
Other
859 stars 152 forks source link

Sandro Hawke's JSON-LD syntax spec review #224

Closed lanthaler closed 11 years ago

lanthaler commented 11 years ago

_@sandhawke's JSON-LD syntax spec review:_

This is my review of json-ld-syntax, as promised in the last meeting.

Summary: The document is in pretty good shape, and I think the underlying design is very good. Below, I suggest a few million editorial changes, a handful of which I think really need to be addressed before publication (and are marked MEDIUM or SERIOUS). I also raise a handful of concerns about the design, but I think they can probably all be dealt with in a few minutes of conversation. I think everything not marked MEDIUM or SERIOUS is fairly trivial.

I reviewed the latest editor's draft: https://dvcs.w3.org/hg/json-ld/raw-file/e582aaa9ee43/spec/latest/json-ld-syntax/index.html

I did not read the json-ld-api. I did play around with the json-ld "playground" site after I was into the appendiced. I haven't reviewed Appendix B yet; I'll try to get to that soon, but it's going to take more brain cells than I have left tonight.

Without further ado...

In an attempt to harmonize the representation of Linked Data in JSON

My first comment turns out to be, I think, the most utterly trivial. Sorry.

? maybe add "natural"

MEDIUM

How about: JSON-LD satisfies the following design goals:

language. We should focus on simplicity when possible.

I don't think that's what you mean. I think you mean simplicity is paramount.

Hard to parse.

Awkward phrase.

SERIOUS

It's somewhat odd that all one needs for conformance is appendix B. So what are the other normative parts of this document for...?

Uh, that doesn't match the definition in #dfn-term. Is a term really a property with its associated value? I don't think so.

This is the first sentence in the document where I have no idea what it means, because it uses concepts not introduced yet. Maybe this can be dropped? Or maybe I'll just have to get it on the second pass.

I find the re-use of the word "context" awkward here.

That makes me wonder if it can be HTML, to be more readable. There would have to be some standard way to find the @context json in the HTML....

This is kind of weird. It doesn't tell me what I'm supposed to do; it just confuses me.

I guess it means they're like comments, and to be ignored?

(or some other example where an @id is more appropriate)

end of Section 5

  • [x] As I come to Section 6 being marked normative, I see Section 5 was neither informative nor normative.

A document on the Web that defines one or more IRIs for use as properties in Linked Data is called a vocabulary.

  • [x] Don't conflate documents with vocabularies, please.

See: https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#vocabularies

Would that work if the order was reverse? I guess so, since JSON doesn't preserve order. Maybe clarify that, and maybe put them in the other order in the example. (TEST CASE?)

MEDIUM

Okay, this overloading of @ keywords goes too far with @vocab serving a completely different purpose (from normal @vocab) in this situation. That's just silly.

SERIOUS

MEDIUM

MEDIUM

(glancing at appendix B for something)

To avoid forward-compatibility issues, a term should not start with an @ character

MEDIUM

SERIOUS

MEDIUM

How about this as a hack that's more elegant:

[
  { "@context": ...
  },
  {
    "@id": "http://manu.sporny.org/i/public",
    "@type": "foaf:Person",
    "name": "Manu Sporny",
    "knows": "http://greggkellogg.net/foaf#me"
  },
  {
    "@id": "http://greggkellogg.net/foaf#me",
    "@type": "foaf:Person",
    "name": "Gregg Kellogg",
    "knows": "http://manu.sporny.org/i/public"
  }
]

Something like:

For some topologies of the graph of nodes being expressed in JSON-LD, such as topologies with loops, embedding along cannot be used, and @id must be used to connect the nodes. In some cases, one may not want to name nodes with IRIs. In these situations, one can use "blank node identifiers", which look like IRIs but with _ (underscore) as the scheme name. For example:

{
    @id: _:n1,
    name: Secret Agent 1
    knows:
      {
        name: Secret Agent 2
        knows: { @id: _:n1 }
      }
}

In this case, we do not want to assign IRIs to the two people, but we want to express that they know each other. We can say SA1 knows SA2 using embedding, but to say SA2 knows SA1 we need to use a blank node identifier.

Every statement in the context having a keyword as the key (as in { "@type": ... }) will be ignored when being processed.

  • [x] I think you mean this only for keywords that are known to be meaningless when used as keys in a @context. I think it would be better to make this an error. But the bigger question is about forward compatibility -- MUST processors ignore all keyword keys in contexts? (Are any allowed, with meaning? I don't see any.)

6.15 and 6.16

  • [x] These should probably be marked non-normative. There's nothing here I need to know to work with JSON-LD (although it's very cool and all).

6.17 Data Indexing

Not sure how I feel about this. It's kind of weird, but pretty harmless, I guess.

I'm not sure it would work, but an alternative design would be to have a particular property be @index'd. So instead of: "@container": "@index" in the context we'd say "@index": "lang" and then the stuff in green would be equivalent to:

"post": [
  {
    "lang": "en",
    "@id": "http://example.com/posts/1/en",
    "body": "World commodities were up today with heavy trading of crude 
    oil...",
    "words": 1539
  },
  {
    lang: "de",
    "@id": "http://example.com/posts/1/de",
    "body": "Die Werte an Warenbörsen stiegen im Sog eines starken Handels 
    von Rohöl...",
    "words": 1204
  }
]

What happens if the same @graph @id is used in two places? are the graphs merged, or what? Shouldnt the spec say? Or is that left to the API document as well? (it's a lot more than an API.) (in TriG they are merged)

SERIOUS

SERIOUS EDITORIAL

Not really reviewed at this time.

E. IANA Considerations This section is non-normative.

SERIOUS

MEDIUM

SERIOUS

if you want the expanded form, you SHOULD ask for it with this profile

(which I think would be silly) or do you mean:

if you receive a request that includes this profile parameter, you SHOULD return expanded form

? I guess the latter, but that's not what it says. I would think you'd use normal media-type rules here -- if you can't provide it in expanded form, then you're not providing it, and fallback to another media type.

Published specification: The JSON-LD specification.

Fragment identifiers used with application/ld+json resources may identify a node in a JSON-LD graph expressed in the resource. This idiom, which is also used in RDF [RDF-CONCEPTS], gives a simple way to "mint" new, document-local IRIs to label nodes and therefore contributes considerably to the expressive power of JSON-LD.

MEDIUM

I have no idea what this text is trying to say. For my best guess, please replace it with:


That's it. I'll try to get to Appendix B. before the meeting, but I wanted to send this early enough that it can be read & digested before Wednesday's meeting.

Keep up the great work, guys. I only point out all these places for improvement because I think this is so important and want it to have the best chance it can.

-- Sandro

lanthaler commented 11 years ago

_[Feedback from Sandro Hawke, part 2]():_

I think the document could be greatly strengthened (and most of my non-trivial comments in part 1 addressed) by the following changes:

  1. In Conformance add something like:

    A conforming JSON-LD Expander takes as input a conforming JSON-LD document D1 and outputs a conforming JSON-LD document D2, using the expansion mapping defined in Appendix XX. D2 will contain no @context declarations and, informally, will convey the same underlying information.

    A conforming JSON-LD Compactor takes as input a JSON-LD @context declaration and conforming JSON-LD document D2 and outputs a conforming JSON-LD document D1, such that a conforming JSON-LD Expander would convert D1 to D2 (or an equivalent document which would JSON-parse to the same internal structure).

    A conforming JSON-LD To-RDF Converter takes as input a conforming JSON-LD document J and outputs an RDF Dataset R using the conversion mapping defined in Appendix C.

    A conforming JSON-LD From-RDF Converter takes as input an RDF Dataset R and output a JSON-LD document J such that a conforming JSON-LD To-RDF Converter would convert J to D (or an equivalent document which would JSON-parse to the same internal structure).

Note there is no need to define the Compaction and From-RDF mappings in detail; it's enough to say (as above) that they are the inverses of already-defined mappings. I believe that sufficiently constrains them. For implementation advice, they can see another document, which need only be a Note.

  1. Add appendix XX which defines the expansion mapping. I have not actually looked at how that's currently defined.
  2. Move json-ld-api sections 5.18-5.21 and 5.23 to json-ld-syntax appendix C.

Note that we should probably change the shortname from /TR/json-ld-syntax to /TR/json-ld for the next publication. It's a bit of a pain, but worthwhile in the long run, I think.

These changes would make json-ld-syntax stand parallel to Turtle, as a completely defined RDF serialization syntax (not needing the API document), but they wouldn't significantly reduce the "RDF tax" on JSON-LD. Just a few sentences in Conformance and a longer RDF Appendex. That seems to me like a good thing (and also what I understood the RDF WG to be asking for).

-- Sandro

lanthaler commented 11 years ago

PROPOSAL 1: Remove section "Design goals"

PROPOSAL 2: Move section "Interpreting JSON as JSON-LD" into basic concepts or combine it with "The Context"

PROPOSAL 3: Remove the "://" safeguard.

PROPOSAL 4: Terms beginning with @ MUST NOT be used (whether we enforce it in the algorithms is a different question)

gkellogg commented 11 years ago

PROPOSAL 1: -1. I think it's useful for the reader to know what the rational for the JSON-LD syntax is. Again, Sandro didn't suggest that the section be removed, but slightly reworded.

PROPOSAL 2: +1. This could be combined elsewhere.

PROPOSAL 3: -1. I think it serves the same purpose for JSON-LD as it does when it was introduced for RDFa; it catches a lot of common problems with the failure to define prefixes. Before RDFa had this, a lot of garbage triples could be generated.

PROPOSAL 4: -0.5. While I think that normal use of '@' should not be tollerated, we should create the ability for the specification to be extended with design notes that use other keywords (e.g., @ordered). If this is a MUST NOT, then it is impossible for someone to extend the specification and not violate a normative constraint.

lanthaler commented 11 years ago

@darobin: Was there a recent change in ReSpec which removed the "This section is normative" statements? At least they don't show up anymore in our specs, see e.g. http://json-ld.org/spec/latest/json-ld-api/#context-processing-algorithms

lanthaler commented 11 years ago

PROPOSAL 1: +0.5 don't really care but the section also doesn't add much value

PROPOSAL 2: +1

PROPOSAL 3: +0.5 no strong preference though.. it just adds extra code; I don't think we can compare JSON-LD and RDFa

PROPOSAL 4: -0.5 for the same reasons as Gregg noted. Would be fine with an SHOULD NOT though. Other specifications can always update the existing one. I'm against enforcing it in the algorithms.

sandhawke commented 11 years ago

On proposal 4:

Extensibility, etc, is great, but I don't see how just allowing '@' would allow that

I think there are two separate questions here:

Question 1. When is it okay for someone to publish-to-the-world json-ld data using an @-keywork that's not in the current spec?

To answer this question, we need to think about why they would want to. I'd think the only reason they would want to would be because they are trying to extend JSON-LD with new features. That's a nice idea, but if several people do it without coordinating their work, we'll end up with mass confusion.

So, I think they need to talk to each other first, and reach consensus on what a term means before anyone starts to use it in the wild. In other words, no one should use such keywords in public until/unless the folks who own the JSON-LD spec (some group at W3C) say it's okay.

That might be done by having a simple first-come first-served registry, or it might mean no one gets to use a new term in public until JSON-LD 1.1 reaches Candidate Recommendation. We don't need to decide that now, but it would be a mess of people just started using their own extensions in public without any coordination.

So I'd say a document using such a keyword is NOT a JSON-LD 1.0 document. It might be a JSON-LD 1.1 document, someday -- in fact, that's how one will recognize a JSON-LD 1.1 document. So that means conforming JSON-LD 1.0 producers MUST NOT produce such documents.

Question 2. What should JSON-LD 1.0 consuming software do if it sees a document with an unknown keyword? This is going to happen (a) when someone makes a mistake, (b) when someone is trying to extend JSON-LD without consensus, or (c) a proper extension is being used or JSON-LD 1.1 is out, but this consumer software hasn't been updated to support them. For (a) or (b) it would be fine to halt and give an error, I think, to say JSON 1.0 consumers MUST NOT consume such documents. For (c), this could be a big problem, though. For this we need to imagine ourselves in the future, wanting to add some new features. (You probably have some in mind, things that were left out of 1.0.) What will you want existing software to do when it sees a JSON-LD 1.1 document, or a document with an extension it doesn't implement?

If such software rejects it -- like an iphone getting flash content -- then users suffer, and people are strongly motivated not to use extensions or deploy version 1.1 So that's no good.

If such software ignores it -- like a web browser getting HTML elements or attributes it doesn't implement, then everything will look fine to the user ----- unless the extension changes the meaning of the data in some important way. Then the user wont even get a warning, they'll just get bad data, with potentially disastrous results. Ignoring it only works, I think, if the extensions are some kind of pragmas or hints that don't affect the basic data.

Maybe the problem can be sidestepped by using a new media type for JSON-LD 1.1. Then the two can exist side by side. Then JSON-LD 1.0 consumers can be told they MUST NOT consume 1.1 documents, because they'll only see them if there is a mime error. (For consuming JSON data as JSON-LD, I guess this still works, since the mime type of the linked-to context would be JSON-LD 1.1's mime type.)

Another solution I see sometimes (SOAP and RIF do this) is that each extension is somehow flagged with MUST-UNDERSTAND or MAY-IGNORE. JSON-LD could do this with stuff in the @context, or with a syntactic hack, may-ignore keywords being lowercase or starting with '@?' or something.

I think I'd propose the group pick EITHER MUST-UNDERSTAND or MAY-IGNORE for everything, and then if/when someone wants an extension with the other kind of semantics, they have to use a new mime type. As for which to go with for now, I'd say it depends what kinds of extensions you think people are going to want first. Are they things that can be safely ignored?

lanthaler commented 11 years ago

Some more updates based on Sandro's feedback and the discussions in today's JSON-LD telecon.

JSON keys that do not expand to an absolute IRI are ignored, or removed in some cases, by the [JSON-LD-API]. However, JSON keys that do not include a mapping in the context are still considered valid expressions in JSON-LD documents-the keys just don't expand to unambiguous identifiers.

This is kind of weird. It doesn't tell me what I'm supposed to do; it just confuses me.

I guess it means they're like comments, and to be ignored?

This was changed to [1]

"JSON keys that do not expand to an IRI, such as status in the example above, are not Linked Data and thus ignored when processed."

6.1 Compact IRIs

Prefixes are expanded when the form of the value is a compact IRI represented as a prefix:suffix combination, and the prefix matches a term defined within the active context

Terms are interpreted as compact IRIs if they contain at least one colon and the first colon is not followed by two slashes (//, as in http://example.com)

These sentences contradict each other. Do slashes prevent recognizing things as compact IRIs or not? I'd suggest not -- that's just extra code that wont be helpful, IMHO. (TEST CASE?)

I've clarified the section and added two test cases in [2].

Values of the form prefix://suffix are not considered as compact IRIs to prevent developers from accidentally overwriting all their http URLs for example. RDFa had to introduce a similar safety-mechanism.

It is a best practice to put the context definition at the top of the JSON-LD document.

MEDIUM

I don't agree. You're telling me I'm going against best practice to build and object in memory and let my JSON serializer turn it into JSON.

I changed it to the following suggestion [3]:

"When possible, the context definition should be put at the top of a JSON-LD document. This makes the document easier to read and might make streaming parsers more efficient. Documents that do not have the context at the top are still conformant JSON-LD."

To avoid forward-compatibility issues, a term should not start with an @ character

MEDIUM

Why only SHOULD NOT? Why not MUST NOT? The damage if they do is considerable.

Also, you kind of need to say what processors MUST do if they see a keyword term they don't know -- ie one from the future. The options are: ignore (if you can figure out what/how much to ignore); or halt; or issue a warning to the user.

I've clarified that note [4]. Just as any other term that isn't mapped to an IRI, terms starting with an "@" that are not keywords are being ignored.

EXAMPLE 5

after this example I was expecting the next example to use a Link header (what turns out to be EXAMPLE 29). Maybe mention it here?

The majority of the group felt that this is too early in the document. I've added the following statement instead [5]:

"JSON documents can be transformed to JSON-LD without having to be modified by referencing a context via an HTTP Link Header as described in section 6.8 Interpreting JSON as JSON-LD. It is also possible to apply a custom context using the JSON-LD API [JSON-LD-API]."

EXAMPLE 6 -- In the example above, the key http://schema.org/name is interpreted as an absolute IRI because it contains a colon (:) and the "http" prefix does not exist in the context.

Now would be a perfect place to have a relative IRI example. You've just talked about there being absolute and relative IRIs, and given an example only of absolute ones.

I've added an example in [6].

[1] https://github.com/json-ld/json-ld.org/commit/aa43ac1f788bc2b69460319696edab6c6cb217cf [2] https://github.com/json-ld/json-ld.org/commit/1d20718c328e932a90b0445eae7f3a61df8a4840 [3] https://github.com/json-ld/json-ld.org/commit/18a5cad721c778b3c222f141e694bd7a33472560 [4] https://github.com/json-ld/json-ld.org/commit/ab62ca52d0e5d2a3fe307495f2a82b72cdb921ee [5] https://github.com/json-ld/json-ld.org/commit/617b7d97ddb81a1a509fb6abd7b868ad4f03fd9d [6] https://github.com/json-ld/json-ld.org/commit/b565c54464313f5ec09ed4fb58efd0258a4c0eb2

Markus Lanthaler @markuslanthaler

lanthaler commented 11 years ago

To be able to externally reference nodes in a graph, it is important that each node has an unambiguous identifier. IRIs are a fundamental concept of Linked Data, and nodes should have a de-referenceable identifier used to name and locate them. For nodes to be truly linked, de-referencing the identifier should result in a representation of that node. Associating an IRI with a node tells an application that it can fetch the resource associated with the IRI and get back a description of the node.

I'm not a fan of this paragraph. Can we just delete it?

I've reworded that paragraph in 5b797083a3af273ddb68126df80ffe7cc8e0f962. Does this address your concerns?

Markus Lanthaler @markuslanthaler

lanthaler commented 11 years ago

Sandro, we discussed proposal 4 today and concluded that we shouldn’t forbid terms starting with an @. Just as any other term that isn’t mapped to an IRI, such terms will be ignored by conformant processors. If someone does map a @-term to an IRI, that’s fine in JSON-LD 1.0 but might break in a later version. An example would be a @definedby tag which is mapped to rdfs:isDefinedBy. This is similar to how unknown tags and attributes are treated in HTML.

lanthaler commented 11 years ago

SERIOUS EDITORIAL

I really don't like the mapping-to-RDF being left to another, later spec. I can live with it just being shown in the examples, except for not knowing what happens with numbers. From the playground I see integers end up as xsd:integer and otherwise they are xsd:double, which is simple enough, but should really be said in this document, or at least shown in an example.

Sandro, I've added ac8214ec02be4c4a6a6283807e00ab82c0035643 an example to the syntax spec:

http://json-ld.org/spec/latest/json-ld-syntax/#conversion-of-native-data-types

Does this address your concerns?

Thanks, Markus

Markus Lanthaler @markuslanthaler

lanthaler commented 11 years ago

_Gregg's reply to Sandro:_

Sandro, we discussed moving the algorithms relevant to RDF conversion out of the API doc and into the base JSON-LD doc, but we don't find that practical. Many of the algorithms outlined in JSON-LD-API are essential for transforming JSON-LD into RDF (context processing, value expansion, IRI expansion, Expansion and Flattening). Instead, I've created an informative description of the process of turning JSON-LD into RDF (and vice-versa) in section C.1. This is necessarily brief, and does not detail the treatment of RDF Collections or Named Graphs.

Please let us know if this resolves this particular issue.

sandhawke commented 11 years ago

I'm not particularly happy with this outcome, but I think I understand it.

I don't know if the rest of the RDF WG will.

I guess I need to study the second doc better and see why it needs to be separate.

lanthaler commented 11 years ago

@sandhawke, I think all the issues you raised in your review of the JSON-LD syntax specification have been addressed. Unless you (or someone else) disagrees, I will close this issue in 24 hours.

sandhawke commented 11 years ago

Sorry, I haven't completed my second review yet.

lanthaler commented 11 years ago

This issue is just about your feedback regarding the syntax spec. Do you mean the review of my changes or the review of the API spec?

sandhawke commented 11 years ago

It's all the same to me: when I swap in JSON-LD, I'll do it all at once.

lanthaler commented 11 years ago

OK, I’ll leave the issue open for the time being.

lanthaler commented 11 years ago

_Follow-up review by @sandhawke:_

This is a partial follow-up review of json-ld. Here I'm reviewing:

JSON-LD 1.0
A JSON-based Serialization for Linked Data
[prepared as] W3C Working Draft 04 April 2013
https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html

Summary: more of the same - mostly editorial - a few issues that will hopefully be simple to review. I'm not quite done, but may have to stop for a day or two, so I'm sending this along now.

Details:

Simply speaking, a context is used to map terms, to IRIs.

Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of

<g> { <a> <b> 1 }

and

 <g> { <a> <b> 2 }

is

<g> { <a> <b> 1,2 }

and sometimes it's obvious the two can't be merged because they contradict each other.

See: http://www.w3.org/2011/rdf-wg/track/issues/17

RESOLVED: close issue-17 -- there is no general purpose way to merge datasets; it can only be done with external knowledge.

  • [x] Proposed solution is to define it here, something like: If multiple embedded JSON-LD documents are extracted as RDF, the result is a dataset formed by merging all the graphs that have the same name (and thus making a single named graph per graph name) and all the default graphs (to make one resulting default graph).

Figure 1: An illustration of JSON-LD's data model.

I haven't managed to produce a good drawing of this. Sometimes I think of it as color-coding arcs, like this:

http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/figures/AnimMerge8.png

and somtimes I think of it as layers:

http://www.flickr.com/photos/danbri/3472944745/ http://farm4.static.flickr.com/3613/3384528143_8304792836_b.jpg

although I image the layers closer together, like transparent sheets of plastic, each with writing on them.

Whenever possible, the graph name /SHOULD/ be an IRI

It seems like there are too many of these.... I think. How can most of the document be non-normative? For example, how am I supposed to know what to do with @index? If I'm writing a generic JSON-LD display tool, do I have to convert it to RDF first? If not, I'm going to have to know what I'm supposed to do with @index.

  Summarized these differences mean that JSON-LD is capable of
  serializing any RDF graph or dataset and most, but not all, JSON-LD
  documents can be transformed to RDF.

Yeah, I guess every RDF graph can be converted to JSON-LD with explicit use of the rdf:first and rdf:rest properties. Ugly, but technically correct.

And (again), I'd suggest that every JSON-LD document can be transformed to RDF, but with a few losses in the process -- you may need to Skolemize, you lose @index information, and any other "ignored" bits.

-- Sandro

gkellogg commented 11 years ago

Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of

<g> { <a> <b> 1 }

and

<g> { <a> <b> 2 }

is

<g> { <a> <b> 1,2 }

and sometimes it's obvious the two can't be merged because they contradict each other.

See: http://www.w3.org/2011/rdf-wg/track/issues/17

We had a discussion on IRC about the problems of merging default graphs. For example, if a developer re-states the facts in both RDFa and JSON-LD in the same document (worse, microdata, which almost encourages the use of BNodes), the result will be a merger with duplicate BNodes, that typically are intended to be exactly the same node. One way would be to provide an algorithm for creating a named graph to contain the default graphs of all included script, microdata or RDF/XML which is also extracted (another case where BNode graph IDs would have been useful). Alternatively, a Note on the subject could just warn against this pattern.

lanthaler commented 11 years ago

My preferred "fix" for the JSON-LD specification to be silent about this as I've already said in the past. What happens when there are multiple embedded JSON-LD documents is, IMO, completely application specific. Of course, the various options could be discussed in more detail in a future Note but the JSON-LD spec seems to be the wrong place for this.

Markus Lanthaler

@markuslanthaler

From: Gregg Kellogg [mailto:gregg@greggkellogg.com] On Behalf Of Gregg Kellogg Sent: Friday, March 29, 2013 6:22 PM To: Sandro Hawke Cc: W3C RDF WG Subject: Re: second review of json-ld

Gregg Kellogg

gregg@greggkellogg.net

On Mar 29, 2013, at 8:24 AM, Sandro Hawke sandro@w3.org wrote:

...

Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of

{ 1 } and { 2 } is { 1,2 } and sometimes it's obvious the two can't be merged because they contradict each other. See: http://www.w3.org/2011/rdf-wg/track/issues/17 RESOLVED: close issue-17 http://www.w3.org/2011/rdf-wg/track/issues/17 -- there is no general purpose way to merge datasets; it can only be done with external knowledge. Proposed solution is to define it here, something like: If multiple embedded JSON-LD documents are extracted as RDF, the result is a dataset formed by merging all the graphs that have the same name (and thus making a single named graph per graph name) and all the default graphs (to make one resulting default graph). I discussed this on the GitHub issue tracker too. We had a discussion on IRC about the problems of merging default graphs. For example, if a developer re-states the facts in both RDFa and JSON-LD in the same document (worse, microdata, which almost encourages the use of BNodes), the result will be a merger with duplicate BNodes, that typically are intended to be exactly the same node. One way would be to provide an algorithm for creating a named graph to contain the default graphs of all included script, microdata or RDF/XML which is also extracted (another case where BNode graph IDs would have been useful). Alternatively, a Note on the subject could just warn against this pattern. Alternatively, the result could simply be a set of graphs and datasets where there is no defined merger, leaving it up to the application; however, I don't find this very satisfying. ... ``` -- Sandro ``` Gregg
sandhawke commented 11 years ago

@gkellogg duplicate blank nodes don't seem like much of a problem; just make the graph lean.... Or maybe that's too hard?

@lanthaler what kind of different applications are you imagining?

gkellogg commented 11 years ago

@sandhawke, is there a definition of a "lean graph" and an algorithm to make a graph lean? It seems similar to graph isomorphism.

Anyway, we do need a separate Note to describe this stuff, as it doesn't bear repeating in both Turtle and JSON-LD, and they shouldn't have to concern themselves with generic issues.

lanthaler commented 11 years ago

On Friday, March 29, 2013 4:25 PM, Sandro Hawke wrote:

This is a partial follow-up review of json-ld.

Summary: more of the same - mostly editorial - a few issues that will hopefully be simple to review. I'm not quite done, but may have to stop for a day or two, so I'm sending this along now.

Thanks Sandro. I've tried to address most of them in cbcd28960b2014dc45e4e98fb192278c99cd47ff.

Details:

Simply speaking, a context is used to map terms, to IRIs.

s/terms,/terms/

Fixed

and types that do not match a term or are neither a compact IRI nor

s/or are neither/and are neither/

Fixed

If multiple embedded JSON-LD documents are extracted as RDF, the result is the RDF merge of the extracted datasets.

Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of

<g> { <a> <b> 1 }

and

 <g> { <a> <b> 2 }

is

<g> { <a> <b> 1,2 }

and sometimes it's obvious the two can't be merged because they contradict each other.

See: http://www.w3.org/2011/rdf-wg/track/issues/17

RESOLVED: close issue-17 -- there is no general purpose way to merge datasets; it can only be done with external knowledge.

Proposed solution is to define it here, something like: If multiple embedded JSON-LD documents are extracted as RDF, the result is a dataset formed by merging all the graphs that have the same name (and thus making a single named graph per graph name) and all the default graphs (to make one resulting default graph).

I decided to just remove that sentence. I think it confuses more than it helps.

Figure 1: An illustration of JSON-LD's data model.

Broken image link.

Fixed

More importantly, the diagram is both misleading and wrong. It's misleading in that each of the nodes is shown as being in exactly one graph; nodes are actually allowed to be in multiple graphs, and nearly always are. It's wrong in that it shows two arcs that aren't in any graph, when actually every arc has to be in one or more graphs.

Good spot. I removed the cross-graph arcs.

I haven't managed to produce a good drawing of this. Sometimes I think of it as color-coding arcs, like this:

http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/figures/AnimMerge8.png

and somtimes I think of it as layers:

http://www.flickr.com/photos/danbri/3472944745/ http://farm4.static.flickr.com/3613/3384528143_8304792836_b.jpg

although I image the layers closer together, like transparent sheets of plastic, each with writing on them.

I didn't introduce layers to show that nodes might be in multiple nodes. I think that would go beyond the scope of this simple, informative illustration.

Whenever possible, the graph name /SHOULD/ be an IRI

s/possible/practical/ (I think)

Fixed

At Risk

I'm a little lost in the AT RISK features. Can we do it like this: http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/#atRisk1 ? So each at-risk feature is identified separately from where it occurs in the specs, on a wiki page (rdf-wg/wiki/JSON-LD_Features_at_Risk or something). And each time it comes up in the specs, that is referenced, along with a clear explanation for people who've never heard of this little feature of the W3C process.

Good idea. I will update the spec to this style tomorrow.

Within the JSON-LD syntax these edge labels are called properties.

Actually, you use the term somewhat inconsistently -- sometimes you call those labels "property names" and sometimes you call them "property labels". I'm not sure this is worth fixing -- I'm probably being overly pedantic to mention it -- but in RDF they'd be considered property names. The property itself is the thing denoted by the IRI. I think in general it's fine to call these things "properties" (and skip over the detail that they are property names), but maybe in the formal model it's better to be precise.

The only two occurrences where we used property names was when we talked about "empty JSON keys". I fixed this as well.

Issue 217

In contrast to the RDF data model as defined in [RDF11-CONCEPTS], JSON-LD allows blank nodes as property labels and graph names. Thus, some data that is valid JSON-LD cannot be converted to RDF. This feature may be removed in the future.

This notion appears a few other times. As I mention in my review of json-ld-api, I think we should say it can be converted, it just requires Skolemizing.

Added that info already when I updated json-ld-api.

Also, the At Risk phrasing should be more clear about what the change might be. Something like: "Based on implementor feedback, the Working Group may decide to prohibit the use of blank nodes as property labels and graph names."

Will do tomorrow.

A JSON-LD document /MUST/ be a single node object or a JSON array containing a set of one or more node objects at the top level.

How about: ... or a JSON array whose elements are each node objects.

Fixed

B.1 Terms A term is a short-hand string that expands to an IRI or a blank node identifier. A term /MUST NOT/ equal any of the JSON-LD keywords. To avoid forward-compatibility issues, a term /SHOULD NOT/ start with an @ character as future versions of JSON-LD may introduce additional keywords. Furthermore, the term /MUST NOT/ be an empty string ("") as not all programming languages are able to handle empty property names.

This whole section concerns me. Can a term contain a colon? Can it be a plain colon? Can it be an apostrophe? Can it be a string of 2^32 ASCII NUL characters? I rather doubt every implementation will allow all of these, but some might, so there could be interoperability problems. And there should be tests in the test suite of all the weird ones (but maybe there already are).

A term can be any valid JSON string except the empty string. So yes, it can contain a colon, it can also be a plain colon. Any control character needs to be escaped.

A JSON object is a node object if it exists outside of a JSON-LD context and:

  • it does not contain the @value, @list, or @set keywords, and
  • it is not the top-most JSON object in the JSON-LD document consisting of no other members than @graph and @context.

Ah, I've seen this text before. :-) Maybe you've replied on that already. Short version: it'd help to give a name to those things mentioned in that last bullet point, at least. Maybe call them "binder objects" or "envelope objects" or something like that. Actually, I think they should have their own section in the Advanced Topics. (And I've already said I don't think they should use the @graph keyword, but I gather you decided against me on that. I'll go check old emails later, I hope.)

Yes, replied to this already. Lets discuss it in the thread.

the keys of the different node objects are merged to create the properties of the resulting node.

maybe s/are merged/need to be merged/ ?

Fixed

Keys in a node object that are not keywords /MAY/ expand to an absolute IRI using the active context.

That use of "MAY" technically means that implementations have the option of expanding them or not, right? Maybe something more like: "Each key can be classified as one of: (1) a keyword, (2) a keyword alias, (3) an absolute IRI, (4) a relative IRI, convertable to an absolute IRI using the active base, (5) a term which expands to an absolute IRI according to the active context, or (6) a term which does not expand to an absolute IRI, (7) a string which does not conform to the term syntax.
Keys of type (6) and (7) are ignored."

Does it? This spec isn't talking about implementations, it's talking about JSON-LD the format. I think in that context it is OK to say that keys MAY expand to an absolute IRI. Please note that a key cannot be a relative IRI.

Actually, writing that makes clear my concern about terms above. How can you tell a term from a relative IRI? Isn't "foo" both? I'd suggest that in json-ld relative IRI's be required to contain a "/" character and terms be limited to c-identifier syntax.

Keys are never relative IRIs. They are either terms, absolute or compact IRIs (@vocab may be used to set an "implicit" prefix for all keys that are neither terms, absolute or compact IRIs).

Also, class (6) keys might well be due to a typo -- is it okay to issue warnings on class (6) and class (7) keys, instead of just ignoring them?

Of course, every implementation is free to issue warnings. However, a JSON-LD won't raise an error and stop processing. It will ignore them and continue processing.

The value associated with the @type key /MUST/ be a term a compact IRI an absolute IRI, a relative IRI, or null.

What does it mean for a @type to be null? I don't see anything in the spec about this case.

Just as every other key that is set to null - it is ignored. It's the same as if it wouldn't have been there.

This section is non-normative.

It seems like there are too many of these.... I think. How can most of the document be non-normative? For example, how am I supposed to know what to do with @index? If I'm writing a generic JSON-LD display tool, do I have to convert it to RDF first? If not, I'm going to have to know what I'm supposed to do with @index.

Depends on what your tool is supposed to do. I personally wouldn't mind making both Basic Concepts and Advanced Concepts normative.

Summarized these differences mean that JSON-LD is capable of serializing any RDF graph or dataset and most, but not all, JSON-LD documents can be transformed to RDF.

Yeah, I guess every RDF graph can be converted to JSON-LD with explicit use of the rdf:first and rdf:rest properties. Ugly, but technically correct.

Right.

And (again), I'd suggest that every JSON-LD document can be transformed to RDF, but with a few losses in the process -- you may need to Skolemize, you lose @index information, and any other "ignored" bits.

Could you please provide some concrete text (given that you weren't completely satisfied with my change in json-ld-api). Thanks

Cheers, Markus

lanthaler commented 11 years ago

All issues have been addressed. Unless I hear objections, I will close this issue in 24 hours.