json-schema-org / json-schema-vocabularies

Experimental vocabularies under consideration for standardization
53 stars 9 forks source link

better support for decimals encoded as strings #45

Open faassen opened 7 years ago

faassen commented 7 years ago

A common way to represent decimals in JSON is to serialize them as a string. This side-steps the floating point precision issues during transport and validation, such as mentioned by json-schema-org/json-schema-spec#312. Eventually a deserializer can then transform the string into a language-specific decimal type such as the Python Decimal.

See for instance DecimalField in Django REST Framework

http://www.django-rest-framework.org/api-guide/fields/#decimalfield

or the Marshmallow serialization library:

https://marshmallow.readthedocs.io/en/latest/api_reference.html#marshmallow.fields.Decimal

which includes extra notes on how to handle such precision issues.

This case isn't very well supported by JSON-Schema. multipleOf is insufficient as there are no guarantees around the representation of number in JSON.

You can define a pattern with a regex that restricts string input to decimals, but the implementer needs to create this regex, the error messages aren't very pretty, it's hard to restrict the input by a total amount of digits, and it's not possible to use minimum or maximum.

Should JSON Schema support this use case? Similar to how you have "number" and "integer" dealing with the same underlying JSON type, we could we have a "decimal" type that validates strings specifically and allows the minimum & maximum logic. It would make the implementation of validators more complex in languages that don't have a decimal/fixed point type, but it does seem to be a common use case.

(I myself ran into it when writing code that converts a Django REST Framework serializer DecimalField into a JSON schema representation but it's not really possible to support max_digits or min_value/max_value).

handrews commented 7 years ago

@faassen I think this would be best handled as a "format" value. Those impose additional semantics on strings. They are also optional to support, so tools that are fine with basic JSON numbers would not be burdened by this.

Note that there is a proposal for "formatMaximum"/"formatMinimum"/formatMultipleOf" which would provide support equivalent to "minimum"/"maximum"/"multipleOf" for such decimal numbers.

handrews commented 7 years ago

@faassen I would be happy to add this to the next draft if you (or someone familiar with the issues involved) can either write up a PR or cover exactly what is needed here.

The main thing that comes to mind is whether you need to be able to support the maximum digits concept, or if it's sufficient to just state that applications respecting the format SHOULD handle such values as fixed-point numbers, with whatever precision is apparent in the string form.

KayEss commented 6 years ago

This is a use case we have, so here ss sort of a summary of what I'd expect/want. A few things that are hopefully not so controversial:

I come across two broad categories of fixed point.

One is common in financial software and some databases and involves storage as strings or as an integer multiplied up by pow(10, places) (or maybe binary coded decimals if a more compact representation is needed). This is common for talking about things like weights and currency amounts. The term "decimal" is often used for these. Applications typically want to specify the number of places after the decimal point, but for storage reasons may also care about the number of places available before the point. If the type is implemented on top of an integer then the maximum value may also be very important.

Another type uses a fixed number of bits with the place specified at a given fixed bit position. For example, you may use 16 bit unsigned storage with 3 bits after the point (giving 13 bits before it). This representation is often used in games because the bit shifts are much faster than the divisions needed for working in base 10. Because these are always exact in bicemals they should be safe enough in a numeric field in JSON, at least so long as they fit in the approx 50 bits that work for a double.

Both of these can be multiplied up a fixed constant to make them integers, but for various reasons this is not typically attractive for applications to do. Transporting the numbers in strings tends to be well supported.

I'll talk to our teams who use json-schema for validation and see what would have helped them for it.

ashnur commented 5 years ago

Since last comment some things have changed. Now Javascript can use other formats as well, but requires non-native number implementation (https://github.com/GoogleChromeLabs/jsbi) or Google Chrome 67+, Opera 54+, Node.js v10.4+ (but Firefox is coming too, as soon as they are able to figure out how https://bugzilla.mozilla.org/show_bug.cgi?id=1366287)

lemoinem commented 5 years ago

decimal format seems more in line with a number type, rather than a string type. Especially given the fact that JSON numbers aren't restricted in precision, scale or other kind of deserialized form.

Many JSON serializers (e.g., C#, Java/Kotlin, Swift, JS, etc.) already serialize their respective decimal types to numbers. They currently always parse numbers as either some form of integer (e.g., int, long) or base-2 floating point (e.g., float, double).

Some JSON deserializer already support deserializing their decimal types out-of-the-box. Moreover, there are also several (e.g., C#, Kotlin) that also allow numbers to be "parsed" as strings as part of the deserialization process. Thus making it somewhat trivial to directly deserialize to a base-10/decimal number.

For all these reasons, having decimal as a format for number rather than string seems like the better alternative.

shachr commented 3 years ago

precision and/or scale are not part of the JSON spec, thus different javascript engines will behave differently when working with decimals, some will truncate or round, more can be found here: https://stackoverflow.com/a/38357877/1332098 and also here: https://blog.skagedal.tech/2017/12/30/decimal-decoding.html

hence, I suggest that the json-schema spec will use strings, which are more reliable in this case, while also introducing a new dynamic format value:

   "type": "string",
   "format": "decimal(10, 2)"
}
grant commented 3 years ago

(This issue seems like best place to comment, given closed issues point here)

I would expect a way to represent a string that contains an int64 data. For example, "1587627537231057".

Example:

"type": "string",
"format": "int64"

I don't know the best way to represent a field that is a string containing int64 data. I'd be interested in proposing this format value if that is required.

Related issue: https://github.com/googleapis/google-cloudevents-go/issues/39

handrews commented 2 years ago

The preferred way to solve this kind of problem in 2019-09 and later is now extension vocabularies (not format). I'm going to move this over to the vocabularies repo.

gregsdennis commented 2 years ago

Representing any numeric value as a string is a hack to get around poor language support. JSON is specifically designed to support arbitrarily large and precise numeric values. There should be no need to put a number in a string.

At a previous job, someone had done a language survey of support in this area. The results showed that the vast majority of parsers encounter a number and automatically parse it as an IEEE 64-bit floating point type (e.g. C#'s double). Even Javascript's primary parser does this! The parsers are the problem, not JSON (and thus not JSON Schema).

Some JSON deserializer already support deserializing their decimal types out-of-the-box. - @lemoinem

This isn't (exactly) right according to our findings. Most parsers go directly to IEEE 64-bit then cast to decimal types, still incurring the associated precision loss. It's a huge problem for financial firms (like my previous employer).

The correct approach would be to either

  1. parse into a "big-num" representation then attempt to cast the value as the desired type once it's known, or
  2. internally save the original textual representation so that it can be parsed when requested.

To get around this shortcoming, developers started storing numbers as strings so that the original representation wouldn't be lost, and the appropriate numeric type can be parsed when it's needed. However, for reasons explained above, this caused other problems.

handrews commented 2 years ago

@gregsdennis while it is something of a hack, it is common in areas like finance where exact precision is important and even IEEE can't guarantee you won't end up with weird behavior. It's the only approach that guarantees predictable behavior, sadly.

As far as parsers, whatever non-standard behavior some JSON parsers might offer is definitely outside of the scope of JSON Schema. We don't include a decimal type in our data model (and shouldn't, as we need to stick close to JSON's data model).