uncefact / spec-JSONschema

UN/CEFACT JSON Schema publication and examples
8 stars 2 forks source link

Decimal representations in uncefact - jsonschema #6

Open DanielBauman88 opened 1 year ago

DanielBauman88 commented 1 year ago

The regex for decimal strings in the schema is proposed as "^([+-]?(0?|[1-9][0-9]*)(\\.?\\d+))$"

This isn't compatible with json's numeric representations and isn't compatible with most languages' float/doule toString implementations which will use E notation when the exponent is large enough.

This is just a question on whether there'd be interest in expanding this regex to allow exponent notation. The reason is that it's a lot easier for users of this jsonschema to convert their json or domain data to the appropriate types.

With exponents supported it's a simple toString, without them a user needs to first use a library to get a full positional string - this creates some friction whenever converting number to strings that match the jsonschema.

(not sure if this spec is finalized or still in draft)

AP-G commented 1 year ago

Hi Daniel,

It is on purpose that it is NOT compatible with float / double, as it is the decimal type. The difference is that float / double always stores numbers on a base-2, while decimal does store the number "as it is". For instance, in Java you must use java.math.BigDecimal and not double / float. With double / float, it is (mathematically) not possible to store precise numbers, especially if the accuracy of a number is most important. For a business example, google for the old Pentium-Bug, where the payslips of certain chief executives were calculated incorrectly due to a double / float problem. If you compare it to XML – it is the same reason why XML supports both decimal and double. But you do not have to use that high numbers. Just look at the discussion on the validation artefacts of the European Norm for electronic invoices for public administration (EN16931). The commonly used validation engine (Saxon) only supports float and double values, but no decimals. There are tons of workarounds being implemented up to legal consequences of the introduction of thresholds – just because one (common) software implementation is not supporting decimals in the correct way. And all that is done there in a business document is summing up some line items amounts and calculating VAT amounts. With the support of decimal, all would be fine…

DanielBauman88 commented 1 year ago

Thanks for this clear response.

I understand the reasoning for representing the decimal as a string in jsonschema to avoid all the accuracy problems that arise with using floats or doubles.

What I'm talking about here is simply expanding the supported string regex so that people converting their business domain numbers into the appropriate string for the jsonschema decimal don't have to do extra work.

So for people who already use double/float in their business domain it just makes it easier for them to convert those values into decimals using toString.

This also applies for customers using precise types. EG a customer using java's BigDecimal needs to know that for compatibility with the edifact jsonschema they have to use toPlainString because toString can give values that will fail the edifact jsonschema regex. This is a gotcha and adds some extra effort. https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#toPlainString-- https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#toString--

Other languages frequently don't have a simple function like toPlainString which makes it a step worse.

EG: Javascript decimal does not (as far as I can see), neither does bigdecimal-rs in rust. Converting to a string with no "E" notation requires some extra work in these languages which makes working with the jsonschema harder.

As an example - the arbitrary precision libraries I've seen (like the ones listed in the edifact jsonschema pdf, support strings with exponents into decimal values.

TLDR; using a string in the jsonschema and not a number is a good thing - because of json generally parsing these numbers as floats/doubles. However, not allowing exponents in the string representation can make it hard to work with and there's nothing imprecise about representing a number correctly with exponents.