json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.82k stars 266 forks source link

Making `integer` a format instead of a type #1395

Closed gregsdennis closed 5 months ago

gregsdennis commented 1 year ago

This discussion was split from #1391. See also #898.

Historically, integer was added to the type keyword even though it's not distinguishable in JSON from number.

Also, some time ago, we decided that format could be applied to any value type, not just strings. If the specified format doesn't apply to that value type, then format would be ignored. For example, the section on date-time starts with

These attributes apply to string instances.

If the instance is a number, then format is ignored.

Proposal

In an effort to more closely align with the type system ("data model") present in JSON (objects, arrays, numbers, strings, true, false, null), integer should be removed as a type and added as a format.

So instead of

{
  "type": "integer"
}

we'd have

{
  "type": "number",
  "format": "integer"
}

Caveat

Coupled with this proposal, there have been discussions elsewhere that this should be able to enforce a difference between a number being encoded as 1.0 and 1.

This is not feasible as JSON Schema currently operates.

Currently, JSON Schema is built on the JSON data model (as mentioned above), not on the text encoding of that data. This allows JSON Schema to operate on other data formats that can be mapped to the JSON data model, such as YAML. Because of this, once a value is determined to be a number, the only way to also determine whether it is an integer is to check the numeric value for a fractional part. Thus, JSON Schema can't identify a difference between 1.0 and 1 because this difference only exists in the text.

To support this, we would have to change JSON Schema to operate on JSON text, which would mean that mappable formats would not be supported unless explicitly stated.

It's also important to note that not all validators would be able to distinguish between 1.0 and 1 as the parsers they're built on may read the text into an internal data model before presenting the JSON to the validator (i.e. the text form is abstracted away from the validator).

jdesrosiers commented 1 year ago

I think this is how "integer" support should have been defined in the first place. As much as I'd like to see this fixed, I'm not sure it's worth the spec churn.

awwright commented 1 year ago

I don't think this makes much sense, { "type": "integer" } is supposed to be an authoring convenience, as shorthand for

{ "type": "number", "multipleOf": 1 }

... so I think making this into a "format" would be an even more roundabout way of doing the same thing that you can already do.


And for number formats in general, my point is that JSON isn't very clear about which numbers should be distinguishable by applications. For example, environments like python will produce different results when parsing 1.0 vs. 1. Having number formats would be a way you could make these distinctions if you need them.

(And the point of making number formats its own keyword is that if you don't know what "foo" is in {"format": "foo"} you have to error, but if you see {"numberFormat": "foo"} you can at least still validate strings.)

gregsdennis commented 1 year ago

One use case I can see for this that everyone would be able to support is an integer or number string format.

Many times (and I've now written a blog post about this), when high precision is required, users will encode numeric values into strings because their parsers read numbers as IEEE floating point values, which loses any encoded precision. The parser is generally too far down into the stack to do anything about it, so they resort to encoding their high-precision numbers as JSON strings and parse the values themselves. This has come up in Slack several times, and a previous employer of mine actually held this as company-wide policy for all of their APIs.

A string-based format: integer would be able to ensure that a JSON string held an integer value.

The downside to this is that other numeric constraints like minimum wouldn't work at all.

felixfbecker commented 1 year ago

I second what @gregsdennis said. Encoding numbers in strings is extremely common anytime you work with money or extremely large numbers (over MAX_SAFE_INTEGER). It would actually be nice to not only have an integer format for this purpose, but a number format that accepts a floating point number (decimal, numeric) too.

ajv-formats with ajv-keywords already implements a concept for applying constraint keywords on strings based on the format: https://github.com/ajv-validator/ajv-formats#keywords-to-compare-values-formatmaximum--formatminimum-and-formatexclusivemaximum--formatexclusiveminimum

For example, when the format is date, the keywords formatMinimum/formatMaximum allow to specify date strings that represent the minimum and maximum dates, just like the minimum and maximum keywords for numbers. Essentially a format may not just define a regex validation, but also a comparison semantic.

That same mechanism would be incredibly useful when using "format": "integer" or "format": "numeric", basically allowing all the common "type": "number" keywords to be applied to "format": "number" too.

This could either be through new format* keywords, or the existing number keywords could be redefined to work on any format that is "comparable".

gregsdennis commented 1 year ago

To be clear, I'm not saying encoding numbers into strings is a good practice; in fact my blog post says quite the opposite. But the fact remains that the practice exists, and we (JSON Schema) need to decide if we're going to cater to it.

If we do, does that then mean we endorse the practice?

felixfbecker commented 1 year ago

I'd say it's simply a necessity. If you're using or building an API to be used from the browser, the JSON parser you have to work with is JSON.parse() (including fetch's Response.json() etc). and JSON.parse() parses numbers as IEEE floating point numbers. Even the reviver parameter only receives the already parsed value, which is why MDN explicitly states:

Note that reviver is run after the value is parsed. So, for example, numbers in JSON text will have already been converted to JavaScript numbers, and may lose precision in the process. To transfer large numbers without loss of precision, serialize them as strings, and revive them to BigInts, or other appropriate arbitrary precision formats.

It would be non-sensical to ship a custom JSON parser to clients just to parse floating point numbers into a decimal abstraction without precision loss – the memory saved/performance gains from not deserializing it as strings would be negated by the extra parser code in the bundle.

So given it's a necessity, and web APIs are arguably the most important use case for JSON schema, I think JSON schema ought to support it independent on whether we consider the situation a bad practice. Given it's a necessity, I don't think it would be considered an endorsement.

gregsdennis commented 1 year ago

and JSON.parse() parses numbers as IEEE floating point numbers.

This is the problem that I outline in my blog. The parser should handle this better.

given it's a necessity...

The practice of encoding numbers into strings is a workaround for the parser not handling large or precise numbers. It's not a necessity if the parsers are fixed.

But this is just my soap box. I recognize that it's not going to happen. It still bugs me, and that it's not going to happen doesn't mean that the workaround is good.

gregsdennis commented 5 months ago

This is a significant breaking change. It's not going to happen for the next release, so I'm going to close it.

Someone is welcome to reopen it if they'd like to see this change in a future release.