w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
109 stars 38 forks source link

Parsing of literals via @type: @id #405

Closed matthieubosquet closed 11 months ago

matthieubosquet commented 1 year ago

My understanding of JSON-LD 1.1 type coercion is that if a context term is defined as "@type": "@id", it must be interpreted as an IRI:

...a string value of a term coerced to @id or @vocab is to be interpreted as an IRI... Values coerced to @id in contrast are expanded as an IRI or a compact IRI if a colon is present; otherwise, they are interpreted as relative IRI references.

The definition of the @id keyword does not seem to go against that.

Let's take a simple json-ld document:

{
  "@context": {
    "a": {
      "@id": "urn:a",
      "@type": "@id"
    }
  },
  "a": [
    "urn:x",
    true,
    "x",
    1,
    1.1,
    {
      "a": "y"
    },
    null,
    []
  ]
}

I would imagine that values that cannot be parsed as an IRI should either:

  1. be discarded, or
  2. result in an error.

Note: I couldn't find a spec description of what to do with values inconsistent with their declared type.

When I process that json-ld document above (for example, via the json-ld playground), it results in the following RDF:

_:b0 <urn:a> <urn:x> .
_:b0 <urn:a> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
_:b0 <urn:a> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
_:b0 <urn:a> "1.1E0"^^<http://www.w3.org/2001/XMLSchema#double> .
_:b0 <urn:a> _:b1 .

So we have:

  1. the string "urn:x" correctly coerced as an IRI;
  2. the string "a" discarded, maybe because it cannot be cast as an IRI, even relative (I understand that having a base would allow casting but I think it is a perfectly valid use case to have documents without a base);
  3. the boolean and numbers cast to their closest xsd: type relative without regard to the "@type": "@id" coercion;
  4. the object cast to a blank node;
  5. the null and array values discarded (in lack of better things to do, which seems maybe correct, even though I am not sure why arrays are discarded and empty objects are cast to a blank node).

In other words: in every case where the type coercion fails, it is ignored, except when the value is a JSON string that cannot be cast as an IRI, which results in the value being discarded. This behaviour seems inconsistent.

Type coercion to a datatype IRI seems to:

  1. coerce strings, booleans and numbers ignoring their JSON types which is the opposite of what "@type": "@id" does (arguably, coercion could similarly be ignored for booleans and numbers);
  2. ignore coercion for objects.

The most inconsistent behaviour in all this is discarding literals that can't be cast as IRIs.

Arguablly, having a consistent behaviour of ignoring coercion (as is the case for objects) when encountering values of native JSON types boolean and integer would be preferable.


Would it be possible to get clarification in the spec's text (or a pointer to it if it already exists) and would it be something that may be considered in the process of clarification (if the parsing algorithm is still ambiguous) to align all type coercion to a consistent behaviour?

gkellogg commented 1 year ago

My understanding of JSON-LD 1.1 type coercion is that if a context term is defined as "@type": "@id", it must be interpreted as an IRI:

Type coercion is used for interpreting string values, not arbitrary values.

...a string value of a term coerced to @id or @vocab is to be interpreted as an IRI... Values coerced to @id in contrast are expanded as an IRI or a compact IRI if a colon is present; otherwise, they are interpreted as relative IRI references.

The definition of the @id keyword does not seem to go against that.

Let's take a simple json-ld document:

{
  "@context": {
    "a": {
      "@id": "urn:a",
      "@type": "@id"
    }
  },
  "a": [
    "urn:x",
    true,
    "x",
    1,
    1.1,
    {
      "a": "y"
    },
    null,
    []
  ]
}

I would imagine that values that cannot be parsed as an IRI should either:

  1. be discarded, or
  2. result in an error.

Note: I couldn't find a spec description of what to do with values inconsistent with their declared type.

This is defined in the API under the Value Expansion. Note that things that are not strings (i.e., numbers, boolean, or objects) are expanded without regard to any type coercion.

When I process that json-ld document above (for example, via the json-ld playground), it results in the following RDF:

_:b0 <urn:a> <urn:x> .
_:b0 <urn:a> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
_:b0 <urn:a> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
_:b0 <urn:a> "1.1E0"^^<http://www.w3.org/2001/XMLSchema#double> .
_:b0 <urn:a> _:b1 .

So we have:

  1. the string "urn:x" correctly coerced as an IRI;
  2. the string "a" discarded, maybe because it cannot be cast as an IRI, even relative (I understand that having a base would allow casting but I think it is a perfectly valid use case to have documents without a base);

It is, and relative IRIs would be resolved relative to the document base, but one is not present in the playground, so it is discarded.

  1. the boolean and numbers cast to their closest xsd: type relative without regard to the "@type": "@id" coercion;
  2. the object cast to a blank node;
  3. the null and array values discarded (in lack of better things to do, which seems maybe correct, even though I am not sure why arrays are discarded and empty objects are cast to a blank node).

null will always be discarded, array values are not discarded, unless they don't contain anything that can be expanded. Ion this case, the empty array contains no values to be expanded.

In other words: in every case where the type coercion fails, it is ignored, except when the value is a JSON string that cannot be cast as an IRI, which results in the value being discarded. This behaviour seems inconsistent.

The important thing that often gets missed is that type coercion only applies to string values. There is an open issue json-ld-api/#509 which may eventually re-consider the use of JSON integers as IRI values.

There

Type coercion to a datatype IRI seems to:

  1. coerce strings, booleans and numbers ignoring their JSON types which is the opposite of what "@type": "@id" does (arguably, coercion could similarly be ignored for booleans and numbers);
  2. ignore coercion for objects.

The most inconsistent behaviour in all this is discarding literals that can't be cast as IRIs.

Arguablly, having a consistent behaviour of ignoring coercion (as is the case for objects) when encountering values of native JSON types boolean and integer would be preferable.

IMO, it is consistent, as type coercion applies to strings, and not other types. Maybe emphasis of this could be sprinkled around the spec where it may be confusing at some future point.

Would it be possible to get clarification in the spec's text (or a pointer to it if it already exists) and would it be something that may be considered in the process of clarification (if the parsing algorithm is still ambiguous) to align all type coercion to a consistent behaviour?

If the links I provided don't clarify, reply back.