json-ld / json-ld.org

JSON for Linked Data's documentation and playground site
https://json-ld.org/
Other
859 stars 152 forks source link

@context / @graph could be required in strict validation #543

Closed christopher-johnson closed 6 years ago

christopher-johnson commented 7 years ago

Relating to a recent discussion on the mailing list, I would like to define a minimum set of constraints for establishing what constitutes a valid JSON-LD document. This can be done in a "strict" validation by adding requirements to the schema.

"required": [
    "@context",
    "@graph"
  ]

Check this gist

While requiring @graph is perhaps contentious, excluding it, is not as clean, as it creates an unbalanced document structure i.e.

Map<String, Object> context;
Object graph;

instead of

Map<String, Object> context;
Map<String, Object> graph;

Furthermore, validating the necessary correspondence between these structures is facilitated by Comparable types. Not clear to me how the existence or absence of the graph as an anonymous object can even be validated. What is clear from the spec is that the existence of both a context and a graph define the identity of a JSON-LD document.

Perhaps providing an optional (but recommended) strict schema with requirements would allow users to conform to a uniform representation.

gkellogg commented 7 years ago

There are some open issues which are planned to be added for 1.1 that would allow eliminating gratuitous use of @graph, so I would be wary of a schema use which mandated its being present. Plus, presumably the schema needs to be applied recursively to embedded node definitions, which likely don't have @context either (something we're specifically enabling with the Scoped Context feature).

christopher-johnson commented 7 years ago

Is an empty document valid JSON-LD? As the current schema (without requirements) provides that assertion.

I realize that the 1.1 spec covers an enormous scope of use cases, which is great, but for LDP, JSON-LD is not easy to implement being so polymorphic, and having no structural validation method makes it even more so.

gkellogg commented 7 years ago

IIRC, a JSON file needs to have at least one object, or an empty array to be valid. Later versions of JSON allow any JSON (RFC7159 says JSON-text = ws value ws, where value is array, object, or native value, but JSON-LD requires the following:

A JSON-LD document MUST be a single node object or an array whose elements are each node objects at the top level.

I believe a JSON-LD file with just an empty array would be valid. If there is an object, it must be valid as a node object, according to the Grammar, a node object is a collection of zero or more properties, so that an empty object ({}) is also valid, and a file containing only an empty object (or an array containing zero or more empty objects) would be valid (but senseless) JSON-LD.

christopher-johnson commented 7 years ago

Thank you for the clarification. Per 6.2

6.2 A JSON object is a node object if it exists outside of a JSON-LD context ...

Meaning that in a JSON-LD document, the root JSON object cannot be a JSON node object since it contains @context, correct? And, furthermore, that only if @context exists can a JSON node object exist.

And 6.2 goes on to reinforce this assertion with:

if a JSON object contains no keys other than @graphand @context, and the JSON object is the root of the JSON-LD document, the JSON object is not treated as a node object

christopher-johnson commented 7 years ago

I believe that the wording of:

A JSON-LD document MUST be a single node object or an array whose elements are each node objects at the top level.

contradicts 6.2. What is should say is:

A JSON-LD document MUST be a single JSON object or an array that contains at least one JSON node object.

JSON node cardinality of at least 1, or document invalid, makes sense to me. And JSON node depends on @context AFAIK. Expanded could be considered application/json and not application/ld+json, since it does not depend on @context and has no JSON node objects per se.

gkellogg commented 7 years ago

6.2 A JSON object is a node object if it exists outside of a JSON-LD context ...

Meaning that in a JSON-LD document, the root JSON object cannot be a JSON node object since it contains @context, correct? And, furthermore, that only if @context exists can a JSON node object exist.

No, this should be interpreted as objects within an @context cannot be node objects. Any node object may contain @context, including the top-level node object. Objects within a context are not node objects.

gkellogg commented 7 years ago

A JSON-LD document MUST be a single JSON object or an array that contains at least one JSON node object.

JSON node cardinality of at least 1, or document invalid, makes sense to me. And JSON node depends on @context AFAIK. Expanded could be considered application/json and not application/ld+json, since it does not depend on @context and has no JSON node objects per se.

The 1.1 document says the following:

A JSON-LD document must be a single node object or an array whose elements are each node objects at the top level.

This may have inadvertently removed the restriction that an array must contain at least one node object and is worth revisiting. But, other RDF formats allow documents that make no statements, so this is in keeping with that. You could say that [{}] makes no statements, as it defines a blank-node subject with no properties, but that might place odd requirements on serializers.

christopher-johnson commented 7 years ago

The @context JSON object is never a JSON node object, correct? (even though a node object can contain @context)

This contradicts

A JSON-LD document MUST be a single node object or an array whose elements are each node objects at the top level.

I agree that objects can be null, but a valid JSON-LD serialization should include nullables as an empty map references.

{
"@context": [ ],
"@default" : [ ]
}

This reinforces the core assertion that @context served without empty map references (i.e. not the output of a serialization) is not JSON-LD.

gkellogg commented 7 years ago

A document may be an object with only @context, which is often the case for contexts. These are valid JSON-LD documents.

The referenced assertion is basically that a JSON-LD document is one or more objects, interpreted as JSON-LD node objects (using an array form optionally, or if necessary). @context is only used for expansion, and a JSON-LD processor will happily process nodes as node objects, possibly just ignoring content.

@graph is also used for grouping multiple objects, or for defining named graphs, but is not necessary.

There is no@default keyword.

a valid JSON-LD serialization should include nullables as an empty map references.

Is this a suggestion? It's not implied from the current spec, IIRC.

christopher-johnson commented 7 years ago

Yes, it is a suggestion, in the absence of a spec. solution to the question. @default is a keyword used by jsonld-java for null output objects from toRDF, and is referenced in the framing spec:

@default Used in Framing to set the default value for an output property when the framed node object does not include such a property.

If @context without content is served as application/ld+json and a processor attempts to deserialize it to RDF under that premise, it yields a null, which could be interpreted as empty content, and then an empty resource could be created for an entity that does not actually represent content. There is no unique identity for an@context document other than an embedded processing keyword that would typically be present anyway, nor for any null serialized node objects in the document, and to me, this seems odd, and quite problematic to interpret.

Normally, what is valid should be defined with concrete methods. What validation method then do you suggest for implementations to identify "RDF" documents from "non-RDF" documents? It seems logical that ld+json Content-Type would indicate the presence of linked data, not a JSON object that is integral to the process of creating ld+json but that is not LD itself.

azaroth42 commented 6 years ago

A document may be an object with only @context, which is often the case for contexts. These are valid JSON-LD documents.

:+1:

azaroth42 commented 6 years ago

I propose an editorial change to make the clarification here more obvious. In rereading, I also interpreted 8.2 plus 8.0 as:

A node object is one that is outside @context and is NOT the top-most object in a json-ld document with no keys other than @graph and @context keys. A json-ld document is a single node object or array of node objects.

Meaning that the top most node object must have more than @graph and @context for it to be a valid json-ld document... which a context document does not conform to.

What would the effects of simply removing the "consisting of no other members than @graph and @context." part of the sentence have? That would allow the JSON-LD document to be only @context.