Closed gregsdennis closed 5 months ago
I think this is how "integer" support should have been defined in the first place. As much as I'd like to see this fixed, I'm not sure it's worth the spec churn.
I don't think this makes much sense, { "type": "integer" }
is supposed to be an authoring convenience, as shorthand for
{ "type": "number", "multipleOf": 1 }
... so I think making this into a "format" would be an even more roundabout way of doing the same thing that you can already do.
And for number formats in general, my point is that JSON isn't very clear about which numbers should be distinguishable by applications. For example, environments like python will produce different results when parsing 1.0
vs. 1
. Having number formats would be a way you could make these distinctions if you need them.
(And the point of making number formats its own keyword is that if you don't know what "foo" is in {"format": "foo"}
you have to error, but if you see {"numberFormat": "foo"}
you can at least still validate strings.)
One use case I can see for this that everyone would be able to support is an integer
or number
string format.
Many times (and I've now written a blog post about this), when high precision is required, users will encode numeric values into strings because their parsers read numbers as IEEE floating point values, which loses any encoded precision. The parser is generally too far down into the stack to do anything about it, so they resort to encoding their high-precision numbers as JSON strings and parse the values themselves. This has come up in Slack several times, and a previous employer of mine actually held this as company-wide policy for all of their APIs.
A string-based format: integer
would be able to ensure that a JSON string held an integer value.
The downside to this is that other numeric constraints like minimum
wouldn't work at all.
I second what @gregsdennis said. Encoding numbers in strings is extremely common anytime you work with money or extremely large numbers (over MAX_SAFE_INTEGER). It would actually be nice to not only have an integer
format for this purpose, but a number
format that accepts a floating point number (decimal, numeric) too.
ajv-formats
with ajv-keywords
already implements a concept for applying constraint keywords on strings based on the format: https://github.com/ajv-validator/ajv-formats#keywords-to-compare-values-formatmaximum--formatminimum-and-formatexclusivemaximum--formatexclusiveminimum
For example, when the format
is date
, the keywords formatMinimum
/formatMaximum
allow to specify date strings that represent the minimum and maximum dates, just like the minimum
and maximum
keywords for numbers. Essentially a format may not just define a regex validation, but also a comparison semantic.
That same mechanism would be incredibly useful when using "format": "integer"
or "format": "numeric"
, basically allowing all the common "type": "number"
keywords to be applied to "format": "number"
too.
This could either be through new format*
keywords, or the existing number
keywords could be redefined to work on any format that is "comparable".
To be clear, I'm not saying encoding numbers into strings is a good practice; in fact my blog post says quite the opposite. But the fact remains that the practice exists, and we (JSON Schema) need to decide if we're going to cater to it.
If we do, does that then mean we endorse the practice?
I'd say it's simply a necessity. If you're using or building an API to be used from the browser, the JSON parser you have to work with is JSON.parse()
(including fetch's Response.json()
etc). and JSON.parse()
parses numbers as IEEE floating point numbers. Even the reviver
parameter only receives the already parsed value, which is why MDN explicitly states:
Note that reviver is run after the value is parsed. So, for example, numbers in JSON text will have already been converted to JavaScript numbers, and may lose precision in the process. To transfer large numbers without loss of precision, serialize them as strings, and revive them to BigInts, or other appropriate arbitrary precision formats.
It would be non-sensical to ship a custom JSON parser to clients just to parse floating point numbers into a decimal abstraction without precision loss – the memory saved/performance gains from not deserializing it as strings would be negated by the extra parser code in the bundle.
So given it's a necessity, and web APIs are arguably the most important use case for JSON schema, I think JSON schema ought to support it independent on whether we consider the situation a bad practice. Given it's a necessity, I don't think it would be considered an endorsement.
and JSON.parse() parses numbers as IEEE floating point numbers.
This is the problem that I outline in my blog. The parser should handle this better.
given it's a necessity...
The practice of encoding numbers into strings is a workaround for the parser not handling large or precise numbers. It's not a necessity if the parsers are fixed.
But this is just my soap box. I recognize that it's not going to happen. It still bugs me, and that it's not going to happen doesn't mean that the workaround is good.
This is a significant breaking change. It's not going to happen for the next release, so I'm going to close it.
Someone is welcome to reopen it if they'd like to see this change in a future release.
This discussion was split from #1391. See also #898.
Historically,
integer
was added to thetype
keyword even though it's not distinguishable in JSON fromnumber
.Also, some time ago, we decided that
format
could be applied to any value type, not just strings. If the specified format doesn't apply to that value type, thenformat
would be ignored. For example, the section ondate-time
starts withIf the instance is a number, then
format
is ignored.Proposal
In an effort to more closely align with the type system ("data model") present in JSON (objects, arrays, numbers, strings,
true
,false
,null
),integer
should be removed as atype
and added as aformat
.So instead of
we'd have
Caveat
Coupled with this proposal, there have been discussions elsewhere that this should be able to enforce a difference between a number being encoded as
1.0
and1
.This is not feasible as JSON Schema currently operates.
Currently, JSON Schema is built on the JSON data model (as mentioned above), not on the text encoding of that data. This allows JSON Schema to operate on other data formats that can be mapped to the JSON data model, such as YAML. Because of this, once a value is determined to be a number, the only way to also determine whether it is an integer is to check the numeric value for a fractional part. Thus, JSON Schema can't identify a difference between
1.0
and1
because this difference only exists in the text.To support this, we would have to change JSON Schema to operate on JSON text, which would mean that mappable formats would not be supported unless explicitly stated.
It's also important to note that not all validators would be able to distinguish between
1.0
and1
as the parsers they're built on may read the text into an internal data model before presenting the JSON to the validator (i.e. the text form is abstracted away from the validator).