json-ld / json-ld.org

JSON for Linked Data's documentation and playground site
https://json-ld.org/
Other
858 stars 152 forks source link

RDF WG Review: Andy Seaborne #135

Closed msporny closed 12 years ago

msporny commented 12 years ago

These are official RDF Syntax review comments by Andy Seaborne via the RDF WG:

Major:

1/ Definitions

I agree with the intention of of making it accessible to the typical JSON application developer, but a narrative without clearly identified definitions means that it is difficult to look into the document to check specific details. It is also easily inconsistent as it is not clear when differentiating text is being descriptive or definitional. Example below.

I suggest keeping the syntax doc as-is and a separate formal-only document (or a separate top level section) for the times when arguing over details matters. Maybe this is a a proper appendix A but I think this is more EBNF; it would not be an appendix.

Example: the text in 4.5 and A.2 about @id are different.

4.5 The value of the @id key must be either a term, a compact IRI, or an absolute IRI.

A.2: "The value of @id must be null, a term, a compact IRI, or an IRI."

which differs several ways.

Example: Is this a legal JSON-LD doc:

{ "@id" : "http://example/thing" }

where do I look?

As the document stands, sorting this out is, for me, a block on LC - too much risk of having to make a substantive and having to restart the LC cycle.

2/ The split between basic concepts and advanced concepts did not work for me.

2a/ Integers as an advanced concept but sets and lists as basic.

2b/ Using HTTP header Link header seems very important.

Other comments

Apologies that the comments are not in document order nor in priority order. In checking them I found myself having to jump about the doc to try to find definitions (see major comment). As different, and seeming identical pieces of text were different in the details, it got messy.

I'm sure I've got some of these comments wrong because of the difficulty in being able to find reference material and so running out of time.

3/ Is the test suite also transferring? It cover both material that is to be migrated and material that is not.

compact: 20 expand: 29 frame: 23 from-RDF: 8 to-RDf: 31 normalization: 57

168 tests; 50% (~80) of which are framing and normalization.

4/ Status of bNodes.

Where are BNode labels allowed? BNodes labels don't get discussed much (fine) but for some of the text that lists possible syntax forms at a given point, don't include them.

4.5 The value of the @id key must be either a term, a compact IRI, or an absolute IRI.

does not include a bNode label (unless "_:a" is an absolute IRI, which it isn't). and

A subject definition that does not contain an @id property is called an unlabeled node.

is confusing as there is another way to be an unlabeled node.

5/ Sec 3.1 Linking Data

We as a group need to review this section

e.g. ""A property should be labeled with an IRI.""

Are there any examples of a Linked Data document that are not RDF or which can't be viewed as RDF?

6/ sec 3.1.1

The Web uses IRIs for unambiguous identification. The idea is that these terms mean something that may be of use to other developers and that it is useful to give them an unambiguous identifier. That is, it is useful for terms to expand to IRIs so that developers don't accidentally step on each other's vocabulary terms.

"vocabulary term" is confusing - I read that as properties and classes, not all things. Unambiguity of things matters.

7/ "Linked Data document" isn't a defined term

An IRI that is a label in a linked data graph should be dereferencable to a Linked Data document describing the labeled subject, object or property.

and datatype?

8/ It's big.

The Syntax doc is as big as RDF/XML by page count currently. I know this has been said before but the concept of "JSON API" leads me to expect something shorter.

The change in ReSpec means it prints badly. It grew 6 pages for me just on the ReSpec change. I know that isn't in the CG control but it does not help the impression that it's a big spec. It is very bad in the JSON-API doc - the method descriptions are forced into cols of about 10 chars.

9/ Compact IRIs

Terms are interpreted as compact IRIs if they contain at least one colon and the first colon is not followed by two slashes (//, as in http://example.com).

Why are http URIs handled diferently to URNs?

"urn:isbn:978-0-521-87625-4"
"urn:uuid:7962241c-2a01-11b2-8057-b443860cde7a"
"og:video:type"
"_:a"

But BNode labels are IRIs: In "Compact IRIs" it says:

If the prefix is an underscore (_), the IRI remains unchanged.

10/ Sec 3.3: example:

This looks exactly like the situation in the previous section around "homepage". A complete example would be better.

12/

The value of a @graph property must be null, an IRI, or a JSON object.

Was a compact IRI also intended? I assume so but it does not say that. Another way to put it, when is the spec language about syntax and when is it about concepts?

Ditto @context - can a @context take a compact IRI (layering of @contexts)? Maybe it's a odd case but why make it asymmetric - an implementation wants a "convert this" function, not "convert1", "convert2", etc.

13/ Sec 4.9: Named Graphs

The definition is for "graph" not "named graph". The first example isn't a named graph.

14/ The longer example in 4.9:

What is the subject of the asOf? (If it's the graph URI, we have the problem with naming of g-snaps and g-boxes).

15/ I found the use of "@type" for datatypes confusing. I prefer @dtype.

16/ Appendix B: To and From JSON-LD

From and To what?

s/proof/evidence/

17/ Correction:

s/@subject/@id/

Syntax error:

second example of 4.3 has several missing or misplaced commas:

{
  "@context":
    [
      "http://json-ld.org/contexts/person.jsonld",
      {
        "foaf": "http://xmlns.com/foaf/0.1/"
      },
      "http://json-ld.org/contexts/event.jsonld" ,
                                          Remove:^^^
    ] ,
   Add^^^
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "depiction": "http://twitter.com/account/profile_image/manusporny" ,
    Add^^^
  "celebrates":
  {
    "@type": "Event",
    "description": "International Talk Like a Pirate Day",
    "date": "R/2011-09-19"
  }
}

and it does not use "foaf:" which is a bit confusing.

gkellogg commented 12 years ago

I've addressed these issues in the noted commits, see below.

These are official RDF Syntax review comments by Andy Seaborne (@afs) via the RDF WG:

Major:

1/ Definitions

I agree with the intention of of making it accessible to the typical JSON application developer, but a narrative without clearly identified definitions means that it is difficult to look into the document to check specific details. It is also easily inconsistent as it is not clear when differentiating text is being descriptive or definitional. Example below.

I suggest keeping the syntax doc as-is and a separate formal-only document (or a separate top level section) for the times when arguing over details matters. Maybe this is a a proper appendix A but I think this is more EBNF; it would not be an appendix.

Problems will inevitablly come when the definitions differ. We do have an issue (#114) regarding expressing JSON-LD in EBNF, which should probably go in appendix A, which already contains an informal description of JSON-LD.

Example: the text in 4.5 and A.2 about @id are different.

4.5 The value of the @id key must be either a term, a compact IRI, or an absolute IRI.

A.2: "The value of @id must be null, a term, a compact IRI, or an IRI."

A.2 is wrong, I've updated to remove null as an acceptable value.

I actually see that one of the tests (compact-17) uses this form, which IMO is incorrect. To remove a property definition within a context, the property value should be null, not an object having an @id key which is null:

{
  "@context": [
    {
      "comment": { "@id": "http://www.w3.org/2000/01/rdf-schema#comment", "@language": "en" }
    },
    {
      "comment": { "@id": null },
      "comment_en": { "@id": "http://www.w3.org/2000/01/rdf-schema#comment", "@language": "en" }
    }
  ]
}

should be

{
  "@context": [
    {
      "comment": { "@id": "http://www.w3.org/2000/01/rdf-schema#comment", "@language": "en" }
    },
    {
      "comment": null,
      "comment_en": { "@id": "http://www.w3.org/2000/01/rdf-schema#comment", "@language": "en" }
    }
  ]
}

I've fixed this test as well. Fixed in commit 7c2b3e61355d300eab50045f6e3dc7c6dc131e7c.

Example: Is this a legal JSON-LD doc:

{ "@id" : "http://example/thing" }

It is valid according to the processing rules, but does not express a triple or quad.

where do I look?

The EBNF should make this clear, something like the following:

SubjectDefinition ::= '{' OptContext DefPropertyObjectList '}' DefPropertyObjectList ::= (PropertyObjectList ',')* '"@id"' ':' string (',' PropertyObjectList) PropertyObjectList ::= property ':' object (',' PropertyObjectList)

Although, that's not LL(1).

I've updated the informatl Authoring Guidelines to use the more accurate subject definition rather than JSON object to clarify this.

Fixed in commit d86ea1f08c442e3c432eee51e59bb6d57e957e6c.

As the document stands, sorting this out is, for me, a block on LC - too much risk of having to make a substantive and having to restart the LC cycle.

2/ The split between basic concepts and advanced concepts did not work for me.

2a/ Integers as an advanced concept but sets and lists as basic.

Agreed, moved this to advanced concepts.

Fixed in commit dbd00ccdd4fb9e9af2a56096c90854ee4b6ca55b.

2b/ Using HTTP header Link header seems very important.

This is actually a secondary usage, for taking a normal JSON document and having it interpreted as JSON-LD. The primary use does not involve the use of a describedby link header.

Other comments

Apologies that the comments are not in document order nor in priority order. In checking them I found myself having to jump about the doc to try to find definitions (see major comment). As different, and seeming identical pieces of text were different in the details, it got messy.

I'm sure I've got some of these comments wrong because of the difficulty in being able to find reference material and so running out of time.

3/ Is the test suite also transferring? It cover both material that is to be migrated and material that is not.

compact: 20 expand: 29 frame: 23 from-RDF: 8 to-RDf: 31 normalization: 57

168 tests; 50% (~80) of which are framing and normalization.

Given the state of the spec, the fact that we have any tests in a test suite is a pretty good thing. AFAIK, the Turtle test suite hasn't changed substantively for this version.

We need to surface the individual tests better on json-ld.org so that they can serve as examples.

At this point, only Compact, Expand, fromRDF, and toRDF suites will come over.

4/ Status of bNodes.

Where are BNode labels allowed? BNodes labels don't get discussed much (fine) but for some of the text that lists possible syntax forms at a given point, don't include them.

For the purposes of this spec, we will consider an unlabeled node (or blank node) identifier to be an Absolute IRI, making a BNode legal anywhere an absolute IRI is expected.

Note that the document says this in section 4.1 on Compact IRIs:

If the prefix is an underscore (_), the IRI remains unchanged. This effectively means that every term containing a colon will be interpreted by a JSON-LD processor as an IRI

4.5 The value of the @id key must be either a term, a compact IRI, or an absolute IRI.

does not include a bNode label (unless "_:a" is an absolute IRI, which it isn't).

As noted, it's treated as an IRI within the spec.

and

A subject definition that does not contain an @id property is called an unlabeled node.

is confusing as there is another way to be an unlabeled node.

Yes, a subject definition without an @id is called an unlabeled node, and the _:a form is called an unlabeled node identifier.

It's not correct to say that a subject definition without an @id is an unlabeled node, as the subject definition mearly defines node properties, where the node may be identified using @id.

I changed this to say "A subject definition that does not contain an @id property defines properties of an unlabeled node."

This is fixed in commit 9fe62823216faf74da061e50837a07447f6241a6.

5/ Sec 3.1 Linking Data

We as a group need to review this section

e.g. ""A property should be labeled with an IRI.""

Are there any examples of a Linked Data document that are not RDF or which can't be viewed as RDF?

Anything which uses an unlabeled node as a type, property or datatype, but I don't think it's worth calling that out.

6/ sec 3.1.1

The Web uses IRIs for unambiguous identification. The idea is that these terms mean something that may be of use to other developers and that it is useful to give them an unambiguous identifier. That is, it is useful for terms to expand to IRIs so that developers don't accidentally step on each other's vocabulary terms.

"vocabulary term" is confusing - I read that as properties and classes, not all things. Unambiguity of things matters.

I changed this to vocabulary term or other resource.

7/ "Linked Data document" isn't a defined term

An IRI that is a label in a linked data graph should be dereferencable to a Linked Data document describing the labeled subject, object or property.

Section 3.1 on linking data starts of by saying:

Linked Data is a set of documents, each containing a representation of a linked data graph.

From this, I think a reasonable interpretation of "Linked Data document" comes from this set. As it's only used a couple of times, I'm not sure it warrents it's own definition.

and datatype?

Changed to type in 5daa9c79e81f713520fb8326cbed6615b158b89e.

8/ It's big.

The Syntax doc is as big as RDF/XML by page count currently. I know this has been said before but the concept of "JSON API" leads me to expect something shorter.

The change in ReSpec means it prints badly. It grew 6 pages for me just on the ReSpec change. I know that isn't in the CG control but it does not help the impression that it's a big spec. It is very bad in the JSON-API doc - the method descriptions are forced into cols of about 10 chars.

This has been discussed elsewhere, and is something that should be further discussed before LC. No specific action in the document at this point.

9/ Compact IRIs

Terms are interpreted as compact IRIs if they contain at least one colon and the first colon is not followed by two slashes (//, as in http://example.com).

Why are http URIs handled diferently to URNs?

"urn:isbn:978-0-521-87625-4" "urn:uuid:7962241c-2a01-11b2-8057-b443860cde7a" "og:video:type" "_:a"

This comes from work in RDFa, where it was found that the unintentional definition of a prefix which was the same as an IRI scheme could cause unexpected behavior. This was a mechanism adopted to resolve this issue, we we're consistent with RDFa.

But BNode labels are IRIs: In "Compact IRIs" it says:

If the prefix is an underscore (_), the IRI remains unchanged.

Again, pretty much the same as how RDFa handles CURIEs.

10/ Sec 3.3: example:

This looks exactly like the situation in the previous section around "homepage". A complete example would be better.

Expanded based on previous example in c4a65a782c6d12e3a045d44f62e55181cf2e8dc0.

12/

The value of a @graph property must be null, an IRI, or a JSON object.

Was a compact IRI also intended? I assume so but it does not say that. Another way to put it, when is the spec language about syntax and when is it about concepts?

This should be subject definition or array of zero or more subject definitions. We removed the option to use an IRI earlier.

Fixed in 2066a2c9ec44e2ca956f87ff5ed74543805c7172.

Ditto @context - can a @context take a compact IRI (layering of @contexts)? Maybe it's a odd case but why make it asymmetric - an implementation wants a "convert this" function, not "convert1", "convert2", etc.

"string expanding to an IRI". Note that it can be relative.

Fixed in b1b253ba896cadf75f601d74eb069ebfda3c060f.

13/ Sec 4.9: Named Graphs

The definition is for "graph" not "named graph". The first example isn't a named graph.

This is use of @graph to describe resources in the default graph.

I made this more explicit in 7d323da0c09b96002c996d4ca0fc5b026a451273.

14/ The longer example in 4.9:

What is the subject of the asOf? (If it's the graph URI, we have the problem with naming of g-snaps and g-boxes).

asOf is a property having the value of @id as a subject in the default graph.

The value of @id is also the name of the named graph.

These examples could be expanded using TriG, but we've avoided doing that so far. I added an issue marker to 54844da2db2f7da6720c3e0b71cfbe4d52fabe3d.

15/ I found the use of "@type" for datatypes confusing. I prefer @dtype.

The spec previously used @datatype, but the group decided to unify these to a single @type. The discussion and resolution are described here: http://json-ld.org/minutes/2012-04-24/#resolution-3.

16/ Appendix B: To and From JSON-LD

From and To what?

s/proof/evidence/

Changed to "Relationship to other RDF Formats" in 95c1f104bf991093790f7f189a9d1bc4af4f2483.

17/ Correction:

s/@subject/@id/

Thanks! That's been around a while!

Syntax error:

second example of 4.3 has several missing or misplaced commas:

{ "@context": [ "http://json-ld.org/contexts/person.jsonld", { "foaf": "http://xmlns.com/foaf/0.1/" }, "http://json-ld.org/contexts/event.jsonld" , Remove:^^^ ] , Add^^^ "name": "Manu Sporny", "homepage": "http://manu.sporny.org/", "depiction": "http://twitter.com/account/profile_image/manusporny" , Add^^^ "celebrates": { "@type": "Event", "description": "International Talk Like a Pirate Day", "date": "R/2011-09-19" } }

and it does not use "foaf:" which is a bit confusing.

Fixed in 72223bd8803f9c39aa75cc54b3e71cabf00d06ec.

We need to add a pass which evaluates the examples to validate they're legal; which would be simpler if we could use the script tag, but we add formatting to the examples which makes this more difficult.

Thanks for your feedback!

Gregg

gkellogg commented 12 years ago

Andy further replies regarding a formal definition:

That is not clear to me - I'll flag this a important WG item (and what the WG itself can do to help).

Is the intention to have a formal definition of JSON-LD? (not EBNF - that's just the syntax part). It does help the descriptive part as well

  • it can be looser, and concentrate on the overall concept which is the intended style?
msporny commented 12 years ago

We had discussed this before, right?

http://json-ld.org/minutes/2012-05-22/#topic-2

Remember, JSON-LD prescribes best practices and tries really hard to not say that something is "illegal". We tend to recover from things that we know we can recover from... and EBNF is usually used to throw "Syntax Errors" + stop processing, and do other "full-halt" responses to invalid input.

I don't agree that EBNF is the right approach for a formal definition of JSON-LD without being very careful with the language. We don't want to give the impression that JSON-LD will throw an error if the EBNF is not matched perfectly... as in many cases, the value is ignored if the EBNF is not matched. There are other rules that we may not be able to express in EBNF... things like conditional branching based on what's in the @context. We should discuss this off-line first to make sure we're on the same page and then raise it again on the call if necessary.

There is a very good reason why we didn't use EBNF to describe what is "allowed" in JSON-LD and went with prose instead.

msporny commented 12 years ago

Andy agreed that EBNF isn't the correct approach, but believes that it can be done more formally:

http://lists.w3.org/Archives/Public/public-rdf-wg/2012Jun/0112.html

Since we are already discussing that in ISSUE #114, and since the JSON-LD Syntax and API spec have been published as FPWDs, I'm marking this bug as closed.