w3c / rdf-concepts

https://w3c.github.io/rdf-concepts/
Other
16 stars 2 forks source link

rdf:JSON value space incorrect #116

Open pfps opened 5 days ago

pfps commented 5 days ago

The rdf:JSON value space is currently defined as

maps (mapping strings to values in the value space where the order of map entries is not significant),

This is not suitable. The referenced type is ordered maps, so the value space incorrectly contains ordered maps when it should contain unordered maps.

The best solution is to not use ordered maps at all and use unordered maps instead.

A much worse solution would be something like

equivalence sets of maps (mapping strings to values in the value space where two maps are equivalent if for each tuple in either map there is a tuple in the other map with the same key and value

Wording later in the section would probably have to be adjusted to mention equivalence sets of maps wherever it mentions maps.

hartig commented 5 days ago

Peter, I looked at the current definition to see how we can fix it such that it addresses your concern.

First of all, just to be precise: It is not true that "the rdf:JSON value space is currently defined as maps [...]." Instead, it is defined to be "the smallest set containing [...] maps" (and also containing other things).

Now, to address your concern I propose we introduce our own notion of "maps" for this purpose, instead of adopting the Infra notion of maps. We may define this notion as follows: A map is a partial function from the set of all strings to the set of all values in the value space.

gkellogg commented 4 days ago

I've expressed my concerns about defining our own notion of "maps" elsewhere, mainly that it is out-of-sync with other specs using INFRA as the target value space for JSON, which seems to be well defined practice at W3C.

If we are to do anything, I think it's important that any local definition of maps we define be normatively linked to INFRA maps. Many specs use INFRA to describe the result of parsing JSON and if JSON Objects are represented using a different type of map, it is inconsistent with these other uses, including JSON-LD and other specs that depend on JSON-LD.

We might say that, for the purposes of defining the value space, a "map is a partial function ..." as long as it defines this as an interpretation of, or a restriction of, INFRA maps, which essentially does not maintain order. The key that, if implementations use the JSON value space that they are not confronted with a mis-match between RDF maps and INFRA maps.

pfps commented 4 days ago

The problem is that INFRA maps are inherently ordered and the maps that appear to be needed for rdf:JSON are unordered. The leap from ordered maps to unordered maps takes about the same amount of conceptual power as defining unordered (normal) maps, particularly the way the INFRA maps are defined. The wording that I proposed above is just about the smallest amount of wording that one could use - would have preferred being more precise but that wording should be good enough.

By the way, it is not completely correct to say that -INF and +INF cannot be serialized in rdf:JSON. One can use 1E400 and -1E400 to serialize them. That's weird, but it works.

hartig commented 3 days ago

Peter, what is the wording that you "proposed above"?

Gregg, I am not convinced. Instead, I agree with Peter. The way the JSON-LD specs use the INFRA maps is quite different. There, the maps are used as a form of data structure used in algorithms that the specs define; as these algorithms don't consider the order of the map entries, there is no difference in using sorted maps or unsorted maps for them and, thus, it is perfectly fine to use ordered maps there. In contrast, in the context of using maps as part of the value space of a datatype for literals in RDF, where values may be compared for equivalence, using an ordered map versus an unordered map makes a difference. So, if we were to introduce the maps in the value space of rdf:JSON based on INFRA (ordered) maps, plus the restriction that you are proposing, then we also need to redefine what equivalence means for these maps in the context of RDF. As a result, an implementation that captures INFRA (ordered) maps cannot be used directly anyways. So, the mismatch that you mention would exist, no matter what. In other words, why would I use a java.util.SortedMap in my implementation and then manually override its equals method to work like the one of java.util.Map if I can use a java.util.Map directly?

pfps commented 3 days ago

@hartig

equivalence sets of maps (mapping strings to values in the value space) where two maps are equivalent if for each tuple in either map there is a tuple in the other map with the same key and value

It's not that values in the value space are compared for equivalence. It's that values in the value space are values and the only thing that matters as far as the value space is concerned is identity. So the value space has both positive zero and negative zero and these are two different values even though there is a notion of equality in which they are considered to be the same.

gkellogg commented 3 days ago

... in the context of using maps as part of the value space of a datatype for literals in RDF, where values may be compared for equivalence, using an ordered map versus an unordered map makes a difference. So, if we were to introduce the maps in the value space of rdf:JSON based on INFRA (ordered) maps, plus the restriction that you are proposing, then we also need to redefine what equivalence means for these maps in the context of RDF.

We do define map equivalence, which does not consider order. INFRA does not define a comparison operator, and different programming languages vary in how they consider order in equivalence.

As a result, an implementation that captures INFRA (ordered) maps cannot be used directly anyways.

Not clear, as INFRA doesn't define map equivalence.

So, the mismatch that you mention would exist, no matter what. In other words, why would I use a java.util.SortedMap in my implementation and then manually override its equals method to work like the one of java.util.Map if I can use a java.util.Map directly?

While defining our own map is not difficult, it does put us at odds with how JSON Objects are treated in other W3C specs, where INFRA has emerged as the standard. Without defining our map in terms of INFRA means that implementations looking to use RDF JSON values will not conform with the values derived using something similar to the INFRA JSON parsing results. To some degree, that makes the values we generate inconsistent with other specifications.

By the way, it is not completely correct to say that -INF and +INF cannot be serialized in rdf:JSON. One can use 1E400 and -1E400 to serialize them. That's weird, but it works.

This would be going off on our own again, as this would be a new standard for representing +INF and -INF in JSON, and doesn't handle NaN. If we want to suggest how Infinity is expressed in JSON, that would be the purview of ECMA/IETF, not this group.

The way we've defined the use of maps for rdf:JSON has the advantage of leveraging INFRA and keeps us in line with how JSON is used within W3C. If the group consensus is to go on our own here, I won't stand in the say of it, but I'm more concerned that it is setting up an inconsistency. It would be best if INFRA also defined an unordered map, which is certainly more consistent with the JSON definition than ordered maps, but other groups considered this and didn't do that. It may be that the TAG should look at this issue.

pfps commented 3 days ago

It's not a matter of going our own way. It's a matter of setting up an RDF datatype correctly.

hartig commented 2 days ago

Peter, you are right. I just looked up what we need for D-entailment (with rdf:JSON in D). We don't need equivalence. Instead, we need a notion of sameness; i.e., what does it mean for two given maps to be the same map.

Gregg, what do these other W3C specs do in terms of checking whether an INFRA map given at some point in a process is the same as an INFRA map given at some other point in the process? Or is this question not of concern in these other specs?

Regarding your worry about incompatibilities in implementations, even if we define our own notion of map, implementations may still use a data structure for INFRA maps, as long as they ensure to use a same-value check in which the order of the map entries is irrelevant. We may also add a Note along these lines.

gkellogg commented 2 days ago

We don't need equivalence. Instead, we need a notion of sameness; i.e., what does it mean for two given maps to be the same map.

In spite of how we define a map, how does the current description of value equality fail to meet our needs? literal equality seems clear (other than -0 == 0 for xsd:double), array equality is defined via pairwise equality (recursively), and map equality is defined by looking for equivalent map entries in each map. These both follow the typical definition of equality for arrays and maps/dictionaries in most programming language.

Gregg, what do these other W3C specs do in terms of checking whether an INFRA map given at some point in a process is the same as an INFRA map given at some other point in the process? Or is this question not of concern in these other specs?

I'm not familiar with how other specifications do this. It would be useful if there were a reverse index into specification so you could examine every use of INFRA or INFRA#list, say, across those specs, but I have no such information.

DID specifically has a callout on the fact that collections are ordered in INFRA, even though that may not be significant, but does not seem to depend on the notion of equality directly.

Regarding your worry about incompatibilities in implementations, even if we define our own notion of map, implementations may still use a data structure for INFRA maps, as long as they ensure to use a same-value check in which the order of the map entries is irrelevant. We may also add a Note along these lines.

We will need a note describing the relationship between our definition of a map and INFRA's ordered-map to give some direction to implementations that already use INFRA for modeling JSON; e.g., an rdf:JSON map is similar to an INFRA map other than order preservation. (There being no definition of INFRA map equality, I don't see why this would need to be called out separately). It might be useful to define our own rdf:JSON map using INFRA map-entry, map-key, and map-value. It could simply be the following:

A map is a specification type consisting of a finite unordered sequence of tuples, each consisting of a key and a value, with no key appearing twice. Each such tuple is called an entry.

Note: the map data structure defined here differs from an ordered-map [INFRA] as map entries are inherently unordered. Implementations using a structure based in [INFRA] ordered-map need to ensure that order is not considered when testing for equality.

msporny commented 2 days ago

@hartig wrote:

what do these other W3C specs do in terms of checking whether an INFRA map given at some point in a process is the same as an INFRA map given at some other point in the process? Or is this question not of concern in these other specs?

We just define the algorithms for performing the checks and state something to the effect of "order isn't preserved". It's a fairly simple addition on top of INFRA.

In the W3C DID Working Group and the W3C Verifiable Credentials Working Group, for maps, we ended up using this language (see highlighted text for unordered maps in DID Core):

For the purposes of this specification, unless otherwise stated, map and set ordering is not important and implementations are not expected to produce or consume deterministically ordered values.

It has been a non-issue for years, so I suggest the group doesn't try to go its own way, INFRA is a fine base to build on top of.

The reason for ordered maps being defined in INFRA and unordered ones not being defined is a fairly arbitrary legacy decision based on how some browsers implemented it back in the late 90s. Maps/Sets were always intended to be unordered, it's just that some of the browser vendors implemented it as sorted by insertion order -- that is, the order in which map items were inserted were preserved in some browsers and developers started basing their map traversal for element in map code off of that arbitrary decision, which then resulted in "bugs" because the developers were expecting order to be preserved when that was never the intention. So, infra changed all of that by stating that both sets and maps are ordered by default (and to ignore the order if order doesn't matter to you). Fundamentally, there was no real implementation advantage for defining the behavior of unordered maps / sets for the browser vendors, so they standardized on always using insertion order, even if it didn't matter.

filip26 commented 2 days ago

@pfps you are right, but this is not the only issue as mentioned above, e.g. numbers. and I'm skeptical that a new definition covering just map ordering would be helpful.

Another argument to keep it as is, perhaps with an explanation, is that many JSON parsers preserve, and processing on an application level expects, insertion-order.

filip26 commented 2 days ago

@hartig java.util.LinkedHashMap, what do you think?

pfps commented 2 days ago

The problem is that RDF is not defining operations on rdf:JSON values that might or might not be sensitive to ordering. What is being defined is the set of values for rdf:JSON and there needs to be only one element of that set that is a map mapping (only) the string "a" to the IEEE double 1.0 and the string "b" to the IEEE double 2.0. INFRA map includes two maps that satisfy this, [ < "a" 1.0 >, <"b" 2.0> ] and [ <"b" 2.0>, < "a" 1.0 > ]. So INFRA map does not satisfy the requirements for rdf:JSON, unless one wants order to be significant in rdf:JSON maps.

To get around this, as I stated above, rdf:JSON either needs to start with unordered maps or have the right kind of equivalence sets of ordered maps in its value space.

filip26 commented 1 day ago

@pfps what about to put an emphasis that maps are ordered in an iteration order? i.e. insertion-order. It's a question if we can even consider insertion-order as a sort order which would require defining an equivalence as you have noted.

An insertion-order is defined by a producer, is unknown to a consumer, but because of the history, and vague JSON definition, (the real issue is JSON itself here) many parsers and application logic treat JSON map with respect to the insertion-order, to preserve it when it makes sense or if there are concerns on interoperability.

I would love to fix all those loose ends, I'm just skeptical it's possible, practical, to do that ad-hoc.

pfps commented 1 day ago

@filip26 It is certainly possible to have rdf:JSON maps include order. I would vote against that as I view it as counter to both the intent of rdf:JSON and JSON itself. In any case there is no notion of insertion order to be found in rdf:JSON.