w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
112 stars 22 forks source link

Is the value space of rdf:JSON and xsd:boolean disjoint? #323

Closed LEW21 closed 4 years ago

LEW21 commented 4 years ago

I'd ask about xsd:strings and xsd:double too, but I'm not even sure if their definitions in ECMAScript are equivalent to the XSD ones.

xsd:boolean:

3.3.2.1 Value Space boolean has the ·value space· of two-valued logic: {true, false}.

ECMAScript boolean:

6.1.3The Boolean Type The Boolean type represents a logical entity having two values, called true and false.

RDF:JSON:

The value space is the union of the four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays) from [ECMASCRIPT]. Two JSON values A and B are considered equal if and only if the following is true:

  1. If A and B are both objects, (...)
  2. Otherwise, if A and B are both arrays, (...)
  3. Otherwise, if A and B satisfy the Strict Equality Comparison defined in Section 7.2.15 in [ECMASCRIPT].
  4. Otherwise, A and B are not equal.

If I were to guess, I'd guess they are disjoint, because that would be similar to the handling of xsd:hexBinary and xsd:base64binary. On the other hand - the definitions look equal, and nothing says explicitely that they should be disjoint.

pchampin commented 4 years ago

From a very pedantic point of view, nothing in XSD's and Ecmascript's respective definitions formally asserts that they are referring to the same thing, even if they both call them "true" and "false". That being said, I would tend to agree with you that the intention is to refer to the same thing, and so that "true"^^xsd:boolean and "true"^^rdf:json denote the same value.

I'd guess they are disjoint, because that would be similar to the handling of xsd:hexBinary and xsd:base64binary

What do you mean by "handling"? JSON-LD processors are not concerned about the semantics of literals, and they are never expected to treat two literals with different datatypes as if they were the same, even if they happen to denote the same value (as "true"^^xsd:boolean and "true"^^rdf:json, or "ff"^^xsd:hexBinary and "FF"^^xsd:hexBinary for that matter).

LEW21 commented 4 years ago

I'd guess they are disjoint, because that would be similar to the handling of xsd:hexBinary and xsd:base64binary

What do you mean by "handling"? JSON-LD processors are not concerned about the semantics of literals, and they are never expected to treat two literals with different datatypes as if they were the same, even if they happen to denote the same value (as "true"^^xsd:boolean and "true"^^rdf:json, or "ff"^^xsd:hexBinary and "FF"^^xsd:hexBinary for that matter).

I mean that xsd:hexBinary and xsd:base64Binary are defined in the XSD spec as disjoint:

For purposes of this specification, the value spaces of primitive datatypes are disjoint, even in cases where the abstractions they represent might be thought of as having values in common.

This has further effects on OWL processing:

According to XML Schema, the value spaces of xsd:hexBinary and xsd:base64Binary are isomorphic copies of the set of all finite sequences of octets — integers between 0 and 255, inclusive. To understand the effect that the disjointness requirement has on the semantics of OWL 2, consider the following example ontology:

  • DataPropertyRange( a:personID xsd:base64Binary ) # The range of the a:personID property is xsd:base64Binary.
  • DataPropertyAssertion( a:personID a:Meg "0203"^^xsd:hexBinary ) # The ID of Meg is the octet sequence consisting of the octets 2 and 3.

The first axiom states that all values of the a:personID property must be in the value space of xsd:base64Binary, but the second axiom provides a value for a:personID that is in the value space of xsd:hexBinary. Since the value spaces of xsd:hexBinary and xsd:base64Binary are disjoint, the above ontology is inconsistent.

So, in practical terms, if "true"^^xsd:boolean and "true"^^rdf:JSON are the same true value, then:

gkellogg commented 4 years ago

I think it's a bit different than asking about the value space of "1^^xsd:integer and "1"^^xsd:decimal, as those are clearly numbers. In the case of "1"^^rdf:JSON, it is a JSON value, which may be interpreted as a number, certainly when parsed by a JSON parser. The fact that it could be interpreted so directly is something of a special case.

The point of the rdf:JSON datatype is to represent JSON values, much as rdf:HTML can represent HTML values. Because rdf:HTML represents DOM fragments, such a fragment could be "1", would this imply that the literal "1"^^rdfHTML should be considered to be the same value? I think not.

pchampin commented 4 years ago

This has further effects on OWL processing

Depending on which datatype map you are using, yes. But not all OWL processor are expected to support all possible datatype.

I agree that this may open a pandora box, though. And in fact, JSON is not defined as a serialization of Ecmascript, so the denotation of "true"^^rdf:json is not bound to be interpreted as Ecmascript's true. I think this is the path suggested by @gkellogg, and I tend to agree.

LEW21 commented 4 years ago

The point of the rdf:JSON datatype is to represent JSON values, much as rdf:HTML can represent HTML values. Because rdf:HTML represents DOM fragments, such a fragment could be "1", would this imply that the literal "1"^^rdfHTML should be considered to be the same value? I think not.

1 in HTML is just a string, so it's a different value than a number.

OTOH, RDF spec says:

RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:HTML literal corresponding to a single text node of the same string.

Still, equivalence is not identity (for example "-0"^^xsd:float and "+0"^^xsd:float are equal, but not identical, and are therefore treated as different values in OWL), and the rdf:HTML value space is defined in terms of DOM DocumentFragments, so I'm pretty sure 'a'^^rdf:HTML is a different value than 'a'^^xsd:string.

LEW21 commented 4 years ago

I think it's a bit different than asking about the value space of "1^^xsd:integer and "1"^^xsd:decimal, as those are clearly numbers. In the case of "1"^^rdf:JSON, it is a JSON value, which may be interpreted as a number, certainly when parsed by a JSON parser. The fact that it could be interpreted so directly is something of a special case.

The spec says that the value of "1"^^rdf:JSON is a number (the ECMAScript number), so it's not just something that can be interpreted this way.

LEW21 commented 4 years ago

I agree that this may open a pandora box, though. And in fact, JSON is not defined as a serialization of Ecmascript, so the denotation of "true"^^rdf:json is not bound to be interpreted as Ecmascript's true. I think this is the path suggested by @gkellogg, and I tend to agree.

Unfortunately, JSON-LD's spec references the JS spec here:

The value space is the union of the four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays) from [ECMASCRIPT].

I'm not a fan of this definition (as the standard JSON does not have any formal ties to ECMAScript), but it's there, and rdf:JSON is bound to both standard JSON (for the syntax) and the ECMAScript interpretation (for the value).

pchampin commented 4 years ago

Unfortunately, JSON-LD's spec references the JS spec here

Oh my, you are right.

I'm not a fan of this definition (as the standard JSON does not have any formal ties to ECMAScript)

Well, the standard JSON is only concerned about syntax, hence the lexical space. We need to go beyond it to define the value space...

iherman commented 4 years ago

My feeling is that what is in the current spec is indeed erroneous, mixing the datatypes...

rdf:HTML defines the value space in terms of the DOM, because the DOM, well, exists. As there is nothing comparable for JSON, the only clean way of doing it is that the value space for rdf:JSON are strings that abide to the requirements of the JSON syntax. It looks like a circular definition, but it is not really; we clearly define what the equivalence is for those strings and that is all that, in my view, RDF can do...

LEW21 commented 4 years ago

My feeling is that what is in the current spec is indeed erroneous, mixing the datatypes...

rdf:HTML defines the value space in terms of the DOM, because the DOM, well, exists. As there is nothing comparable for JSON, the only clean way of doing it is that the value space for rdf:JSON are strings that abide to the requirements of the JSON syntax. It looks like a circular definition, but it is not really; we clearly define what the equivalence is for those strings and that is all that, in my view, RDF can do...

This sounds cool, but this way differently formatted equivalent JSON documents would be treated as a different values - which is a problem that JSON canonicalization is trying to solve. Still, right now the JSON canonicalization is just a draft, and other specs - like JWT - don't really care about it, they just treat the whole JSON object as a string, with formatting and all included.

iherman commented 4 years ago

This issue was discussed in a meeting.

iherman commented 4 years ago

@LEW21 see the comment above (or the meeting minutes). The current spec, in effect, does include a definition of canonicalization, so the value space being the set of canonicalized JSON texts seems to be fine.

LEW21 commented 4 years ago

I agree, that it's better to use canonicalized JSON instead of JS object model as the value space here.

However, I think treating the strings as-written might be even better:

1. JSC is a draft

JSON Canonicalization Scheme (JSC) is still a draft, so it can't be referenced normatively. It's still getting a new version every month. While I'm not proficient in understanding the IETF process, it doesn't look like it's going to become a standard soon.

2. JSC - Serialization of Numbers

JSC relies on the JavaScript implementation of Numbers. This is not based on the JSON standard, but their own decision - as the JSON standard does not enforce, or even recommend, any particular implementation of Numbers. And there are people that use JSON with non-JS-compatible Numbers. It's not a good practice, but as long as it's not disallowed by the JSON spec, it should be supported.

3. JSC - Sorting of Object Properties

JSC specifies that object properties have to be sorted - because JSON says they are unordered. In practice, there is a growing number of people who are using them in a ordered way.

I think that at some point the JSON spec will have to be amended to acknowledge this practice.

4. Compatibility with JWT

JWT was standarized before the JSC work has started. It says that:

This JSON object MAY contain whitespace and/or line breaks before or after any JSON values or structural characters, in accordance with Section 2 of RFC 7159 [RFC7159].

So it simply preserves whatever was thrown at it by the user. This "whatever" is then signed. So - if somebody would want to store both the JSON payload and the signature in an RDF database, in separate properties - he would need the payload to remain intact. He would probably like to tag the payload as JSON, but - if rdf:JSON depends on canonicalization - that would break in some cases (unfortunately not all, so it's possible he wouldn't even notice it until it gets on production).


Still, canonicalization of course has its uses. It's nice to be able to parse the object, and then serialize it, and have the result be the same value. So - it might be a good idea to recommend using the canonical form (or at least a whitespace-less form, because these are discarded by all the JSON parsers+serializers) in the lexical space. While such recommendation wouldn't work for example for numbers, which are commonly written by hand - JSON values are usually produced by software, so the generators could be programmed to generate as-canonical-as-possible output. And - if somebody needs something that's not supported by JCS, or some other way to canonicalize JSON is standarized in the future - you're safe, everything still works.

iherman commented 4 years ago

@LEW21,

JSON Canonicalization Scheme (JSC) is still a draft, so it can't be referenced normatively. It's still getting a new version every month. While I'm not proficient in understanding the IETF process, it doesn't look like it's going to become a standard soon.

You are absolutely right. It is a draft, who knows where it goes, and we cannot rely on it in the spec. As I said, we were forced to use our own definition which is in the spec, see the entry on "The canonical mapping". See also the note after this: if (and when...) the JSC is indeed a standard, future versions of this spec may be adapted. But, at this moment, this WG has no other choice than to has its own canonicalization rules.

iherman commented 4 years ago

This issue was discussed in a meeting.

gkellogg commented 4 years ago

@LEW21 can you please indicate if this satisfies your concern?

twistos commented 3 years ago

CO SIE KURWY ZAMKNELISCIE IC WAM TO NIE DA BDZIECIE JEBANI !!!