Closed jakubklimek closed 1 year ago
leading to conversions like
1.2E1
instead of12
That is not accurate. In JSON-LD, 12
will still render as 12
.
But it is true that 12.1
will render as 1.21E1
(as in your example), and that is indeed an issue.
I believe that xsd:decimal is superior to xsd:double in many respects (see this paper (pdf) for a detailed analysis), so I am in favour of option 2.
Another option would be the following: define a new RDF datatype dcat:decimal
(really, I see no reason why the E notations were not included in the lexical space of xsd:decimal !...)
Then change the range of all concerned dcat properties from xsd:decimal to dcat:decimal. As stated above, this is not a breaking change (xsd:decimal is semantically a subtype of dcat: decimal -- even syntactically, any valid xsd:decimal value can be "cast" to a dcat:dedimal without changing its meaning).
This way, the literals produced by JSON-LD processor would be valid when the original value was any JSON number...
@iherman any opinion on this?
@pchampin that seems to work, playground. But I am not sure what exactly the requirements are for @jakubklimek. Is the 'E' notation all right, provided the datatype is also correct?
The problem, of course, is whether there is any reasoner that is properly prepared for such datatype reasoning, it, for datatypes that go beyond xml schema. I have zero experience with that...
The pro aspec is that there is nothing to do for the JSON-LD spec. Although... the playground does the right thing, but does the JSON-LD spec say that the conversion of JSON numbers should happen the way it happens, or is this only an implementation side effect? @gkellogg this question is really for you...
Well, my requirement is for JSON-LD distributions of DCAT-AP to pass DCAT-AP SHACL validation, which validates the datatype of dcat:spatialResolutionInMeters
as xsd:decimal
, which is not achievable when the JSON-LD distribution has this property as JSON Number (which seems natural from JSON point of view), not JSON String.
JSON numbers are problematic. JSON-LD will treat them as integers or doubles based entirely on the presence or absence of a fractional part. To maintain fidelity either XSD types, you should stick with string values.
A future version may change the interpretation based on JCS, which is used for JSON Literals, but it remains fundamentally problematic.
The problem with JSON numbers goes back to the over-simplified view in JavaScript.
if data type fidelity is important, stick with value objects having a string value.
@iherman in my experience, many RDF implementations (libraries, triple stores, inference engines) support xsd:decimal. So yes, replacing it with dcat:decimal might disrupt some systems... More problematic: normatively, OWL2 only supports a predefine set of datatypes (including xsd:decimal). Any other datatype is, by default, rejected by reasoners (although Hermit has an option to simply ignore them). So this would be a daring step. But I believe it would solve problems in the long term.
@gkellogg I agree that sticking to strings (with value objects or type coercion) is the easy way out in JSON-LD. But can we really convince people to give up on number in JSON? This kinds of contradict our narrative that "JSON-LD can still be used as day-to-day JSON" if we add "... expect for this or that kind of JSON".
@pchampin said:
@gkellogg I agree that sticking to strings (with value objects or type coercion) is the easy way out in JSON-LD. But can we really convince people to give up on number in JSON? This kinds of contradict our narrative that "JSON-LD can still be used as day-to-day JSON" if we add "... expect for this or that kind of JSON".
Note that the use of decimal values, as opposed to integer or double, is really an RDF issue, which probably doesn't impact people wanting to make their existing JSON just work. We've long warned about the potential for native numbers to be misinterpreted, and to avoid built-in behavior for XSD datatypes. Generally, you can interpret arbitrary JSON as JSON-LD, but absent other data typing information, there is no other good way to handle native numbers.
If we're talking about a hypothetical normative change, then sure, I think we can do better. However, the behavior in 1.0 and 1.1 is to not manipulate native values, except as defined in the To- and From-RDF algorithms. Specifically, we were wary about adding any data-type specific behavior.
In a (hypothetical) future update, which may need to be a 1.2 release due to the impact, I could see making the following changes:
Change step 10) in the Object to RDF Conversion Algorithm take advantage when datatype equals xsd:decimal
and value is in an appropriate range for decimal and define appropriate serialization rules there and for 8.6 Data Round Tripping. We apparently have a similar rule in place when datatype is xsd:double
.
Change step 2.4.3) in the RDF to Object Conversion Algorithm to include xsd:decimal
, and set type to xsd:decimal
, so that it is retained in the resulting value object.
However, as these are prospective normative changes that would affect the expected behavior of already compliant processors, there are some process steps we'd need to go through to send JSON-LD API back through CR, so I don't see how it can really affect this issue.
The more I think about this issue, the more I believe that the problem is the definition of xsd:decimal
itself. It should allow the E notation. And the good new is... in practice, it kind of does. I've done my research, and a significant number of RDF implementations supporting xsd:decimal
are perfectly happy with, e.g., "1.23E1"^^xsd:decimal.
I have created a repo describing the issue, and the current state of implementations: https://github.com/pchampin/xsd_decimal/
I thought I would share this with the semantic web mailing list to get a sense of the community's opinion about this. @iherman, @gkellogg, what do you think?
To get back to your issue, @jakubklimek, my conclusion is that you should be using strings to be on the safe side, but in many cases you may still use numbers and not run into any problem, because many implementations will recognized these non-standard xsd:decimals produced by JSON-LD...
Great discussion folks - thanks @pchampin for the research and evidence.
+1 for @jakubklimek intent (use case) here - "semantic uplift" of typical JSON serialisations seems to be the way in which any component in a system can augment a JSON payload to provide information about its meaning - there is no reason an intermediary or client cant be aware of this context and augment the information it gets from a service - in fact that the way the whole WoT is predicated to deal with low-power networking protocols from sensors.
this is exactly what we wish to do in the OGC to formalise GeoDCAT as a profile of DCAT, with a normative context to make it JSON-LD compatible and allow GeoSPARQL to provide richer spatial semantics. (@jakubklimek can look into publication and reuse of a common DCAT context to be referenced by any DCAT profiles?)
If the problem is in a fundamental in the xsd:decimal lexical rules (not semantics) - then layering in workarounds - such as requiring "non-natural" string based serialisations will require special code support on every server and client component.
IMHO it would be better to aim for the simplest and most interoperable option for the greatest number of users over the longer term - which it would appear to be either update serialisers to not use use E notation - or update client libraries to support it.
Perhaps lowest total effort to a best outcome is to make parsers "future compatible" with a proposed updated to xsd:decimal, and declare this as a formal profile for now so components have a mechanism to at least be transparent at run-time. This will not break any existing systems, but would allow new systems to be built that do not impose unreasonable burdens on clients in future.
Note that Apache Jena supports GeoSPARQL (https://jena.apache.org/documentation/geosparql/index.html) - which is just being updated to a 1.1, perhaps it would not be too big a reach to ensure this update is also factored in. @nicholascar might have some further insight into this.
The more I think about this issue, the more I believe that the problem is the definition of
xsd:decimal
itself. It should allow the E notation.
Well, I do not have any contact with the main developers of XSD schemas any more, so we can only guess why they created this datatype in the first place. I would think that they thought authors should use xsd:float
or xsd:double
if using E notation. But that is water under the bridge now; I do not think XSD will ever change.
One could also say that RDF may have been lazy by simply adopting XSD as the bases for datatypes instead of adopting something possibly simpler (how many people are there around who read the XSD specification with all its intricacies and details?). I suspect that may also be water under the bridge...
And the good new is... in practice, it kind of does. I've done my research, and a significant number of RDF implementations supporting
xsd:decimal
are perfectly happy with, e.g., "1.23E1"^^xsd:decimal.
Which reinforces what I said: even developers did not read (or possibly did but wilfully ignore) the XSD spec... 😀
I have created a repo describing the issue, and the current state of implementations: https://github.com/pchampin/xsd_decimal/
I thought I would share this with the semantic web mailing list to get a sense of the community's opinion about this. @iherman, @gkellogg, what do you think?
I am not sure if it is worth if the question is how JSON-LD would have to map to RDF w.r.t numbers. There may be a (much) longer discussion on the whole area of datatypes for RDF, and whether, after 20 years, the choice of XSD was indeed a judicious one and whether a major simplification in that area would be worthwhile. But that discussion makes only sense if it leads to some consistent datatype specification that future RDF data can universally use; otherwise it will lead to a purely academic discussion...
(A good thing is that any change does not necessarily require the a change in the RDF standard itself, although the RDF spec lists XSD explicitly. Nothing forbids to propose, and widely adopt, an alternative datatype system.)
https://www.w3.org/TR/xsd-precisionDecimal/ (via https://www.w3.org/TR/xmlschema11-2/#primitive-vs-derived)
It is not a derived type of xsd:decimal.
Systems maybe implementing decimals as doubles.
Java BigDecimal
supports scientific notation.
Changing numbers would also mean defining arithmetic.
@iherman
One could also say that RDF may have been lazy
One could also say that it refrained from reinventing the wheel :-) But indeed, now we are stuck with a specification that is not likely to be updated -- or could it?
@afs, wow, I didn't know about that one. Thanks for the reference. However, I am not sure this really solves the issue here:
xsd:decimal
)xsd:decimal
(precision, infinite values, NaN) that are not necessarily relevant hereSorry it does not help. But why change xsd:decimal
?
https://www.w3.org/TR/xpath-functions-3/#casting-to-decimal
xsd:decimal(xsd:double("2E3"))
@afs clearly, it is possible to convert between xsd:double
(in E notation) and xsd:decimal
. But the initial issue here was that a a seemingly correct JSON-LD document may produce ill-formed xsd:decimal
literals, before you get a chance to process them with SPARQL...
We could fix this by either changing JSON-LD, or changing xsd:decimal
. The 2nd option seems more reasonable to me, all the more that many implementations are already supporting E-notations.
but let's re-focus the discussion on the issue raised by @jakubklimek, and in the way we can address it here, that is in the context of DCAT-3. (I suggest the more general debate on xsd:decimal
be moved to the repo I created for it)
@riccardoAlbertoni @davebrowning @dr-shorthair Unless I missed something, the DCAT spec does not at all talk about how it can be serialized in JSON-LD, right? So I am not entirely sure where the warning text should go...
We could put it after the description of dcat:spatialResolutionInMeters
, as it is currently the only property with domain xsd:decimal
, but that feels a bit ad-hoc.
Furthermore, this is neither an issue of that property nor of JSON-LD in general, but an issue that might arise for some JSON-LD context and some data using this context...
So, any opinion on where this warning text would be best located?
before you get a chance to process them with SPARQL...
SPARQL has nothing to do with this - your survey needs to test parsers and the handling of such data end-to-end. Equality of constants is only part of the picture. Try xsd:decimal("2E3")
rather than the ^^
form.
Your survey needs to cover RDF/XML. A system that uses an existing validating XML parser needs changing as well.
SPARQL was not my point. It is a legal cast in F&O - a non-RDF standard with a significant adoption. SPARQL queries would require the cast for stability across remote stores.
We could fix this by either changing JSON-LD, or changing
xsd:decimal
. The 2nd option seems more reasonable to me, all the more that many implementations are already supporting E-notations.
It's a SHACL change.
@afs thanks again for these very relevant remarks. Would you mind raising them as issues on https://github.com/pchampin/xsd_decimal where we could discuss them in length? As I wrote above, I think focus this thread on how to address this in the short term, for DCAT (or its profiles).
@rob-metalinkage
we wish (...) to formalise GeoDCAT as a profile of DCAT, with a normative context to make it JSON-LD compatible
The documentation of such a context would be, IMO, the right place to explain this caveat and the possible workarounds.
requiring "non-natural" string based serialisations will require special code support on every server and client component
Using strings for conveying decimal values is not as unreasonable as it sounds, because JSON numbers can be lossy, as raised by @gkellogg above.
Maybe the most pragmatic way forward would be to extend dcat:spatialResolutionInMeters
to allow both xsd:double
and xsd:decimal
in its domain, with a not explaining that xsd:decimal
is encouraged (to avoid precision loss) but that in some situations (such as "natural" serializations in JSON-LD) xsd:double
is more convenient.
A drawback is that such a range can not be expressed in RDFS (but it can be expressed in OWL). I don't think this is too much of a problem.
Note that JSON-LD users would still have the possibility to use either types (being more explicit for xsd:decimal
) (see example in the playground). A property alias could even be provided to make this less verbose (see example in the playground).
Even aside from any disruption a re-definition of xsd:decimal may cause, allowing xsd:decimal to contain an exponent may create a different problem. I think it is important to be able to syntactically distinguish an xsd:double from an xsd:decimal. Currently, the presence of an exponent is what signals an xsd:double: 1.23E1 must be an xsd:double. If an exponent were allowed in an xsd:decimal, then it isn't clear how an xsd:double could be syntactically distinguished: 1.23E1 would (presumably) conform to both datatypes.
Possibly one could use the number of digits in the mantissa to distinguish them, because xsd:decimal is required to support 18 digits, whereas xsd:double supports a mantissa up to 2^53 = 9,007,199,254,740,992, which is 16 digits. But if E notation is added to xsd:decimal then the digits in the exponent would probably come out of that same 18-digit budget, so you'd again be faced with not being able to syntactically distinguish them. Furthermore, it would probably be confusing to have subtly different limits on the number of digits permitted in the mantissa and/or exponent between xsd:decimal and xsd:double.
P.S. My guess is that this ability to syntactically distinguish an xsd:double from an xsd:decimal is the reason why an exponent is not allowed in an xsd:decimal in the existing standard.
If we take @pchampin idea of defining range to be xsd:double or xsd:decimal then implementations can choose which they need - and a note saying E notation should not be used in xsd:decimal to be strictly compliant makes sense. Either that or make just a xsd:double - there is not enough information supported by other DCAT property semantics for the precision to matter much I suspect: spatial resolution is a complex thing depending on all sorts of map project issues - and the subtle differences with number precision would be lost in, for example, the issues of plate tectonic shift (some jurisdictions use dynamic datums with temporal epochs). "Effective spatial resolution" would need to be calculated anyway depending on map projection of the data, view angles of sensors and all sorts of things, and maybe an approximation to start with.
So the real solution for precision would be to fix this in GeoDCAT - allowing core DCAT to provide "general statements" and GeoDCAT be rich enough to have all the other information you would need to make the fine distinction. Syntactic interoperability matters most for core DCAT as the semantics probably dont allow more precise information regardless of syntactic precision preservation - so xsd:double might be easiest?
also
SHACL constraints are probably going to be more practical use than OWL or RDFS - AFIACT very little OWL reasoning is done dynamically in the sort of environments concerned with data cataloging - DCAT doestn't really support rich enough description to support much meaningful reasoning - you would need to attach other data with relevant information models for most interesting cases anyway.
Note that the union definition of xsd types is already present in DCAT: all temporal data properties allow any 'temporal' xsd type.
@jakubklimek thanks for highlighting this issue. I think also that we should consider what is the objective here: namely DCAT. The question we have to raise is what one wants to do with this property.
I might be mistaken, but in so far I have not yet seen "calculations" happening with the DCAT descriptions. If the use of this property is to print on a webpage the value, then xsd:decimal is really fine. But then unfortunately JSON(-LD) users cannot use native json numeric types.
Since this issue is at the core of bridging JSON to RDF, the issue is not limited to DCAT but to the use numeric values in all w3c vocabularies. I would like to see a solution that is workable for all rather then a specific approach in each vocabulary.
Observe that in some cases the representation "3.45" is the expected representation (e.g. monetary values), while in others (e.g. the quantity of CO2 in the air) the double representation is the expected. For that reason I believe that the topic is beyond DCAT as such.
@rob-metalinkage I see your point about precision and spatialResolution
so maybe xsd:double
only would be suffisient here. The problem is that DCAT2 defines its range to be xsd:decimal
. Replacing it with xsd:double
in DCAT3 would be a breaking
Re. OWL vs. SHACL: SHACL has no problem expressing that a property accepts multiple datatypes.
@dbooth-boston I think your argument about distinguishing between double and decimal, as a motivation for forbidding the E notation in xsd:decimal, is very plausible.
And it makes me realize that the problem raised by @jakubklimek , in JSON-LD, may also occur in Turtle! More precisely, with the currently, the following snippet is compliant
[] dcat:sparialResolutionInMeters 100.0.
but the following is not!
[] dcat:sparialResolutionInMeters 1e2.
I think this is one more argument for allowing both datatypes for this property.
@pchampin indeed,
[] dcat:sparialResolutionInMeters 1e2.
is currently invalid, as it indicates the xsd:double
datatype instead of xsd:decimal
. However, in Turtle, this is solvable by simply using the correct datatype, i.e.
[] dcat:sparialResolutionInMeters 100.0
The actual problem in JSON-LD is that it is currently not solvable with JSON decimal numbers used, as those always get transformed to the xsd:double
syntax while processing JSON, no matter what the JSON-LD context later states as a datatype. I am simply unable to get e.g. 12.1
in the xsd:decimal
syntax when loaded using an RDF library. I always end up with e.g. 1.21e1
and then there is nothing I can do about that in the RDF world.
However, in Turtle, this is solvable by simply using the correct datatype
One could argue that in JSON-LD, this is also solvable by simply using the correct datatype, i.e. a JSON string instead of a number (and leave it to the context to coerce that string into xsd:decimal
-- see playground).
I agree that using a string to express a numeric value is counterintuitive, but IMO so is the subtle distinction that Turtle makes between 1e2
and 100.0
-- I don't know any other language that consider these as different beasts.
I also agree that, in an ideal world, JSON-LD would have better support for xsd:decimal
, but as @gkellogg pointed out above, this is a problem inherited from JSON and the over-simplified view of numbers in JavaScript.
PR #1543 has mediated the different views and provides a non normative indication about the issue and possible solutions.
There is a technical issue with the definition of
dcat:spatialResolutionInMeters
when used with JSON-LD. Specifically, the issue is the range beingxsd:decimal
.In JSON-LD,
xsd:decimal
is not supported for numbers, see the note in the specification. Therefore, when in JSON this number is actually a JSON number, not JSON string, even if the JSON-LD context specifies the datatype to be explicitlyxsd:decimal
, the number is treated asxsd:double
, leading to conversions like1.2E1
instead of12
when loading JSON-LD as RDF, in turn leading to invalidxsd:decimal
RDF literal"1.2E1"^^xsd:decimal
. This can be seen e.g. in JSON-LD playground.Possible solutions:
xsd:integer
orxsd:double
, which is a breaking change