json.dumps() should encode float number NaN to null

10a1cbf5-763e-4aa8-bb14-e0296ea27213 commented 4 years ago

BPO	40633
Nosy	@rhettinger, @mdickinson, @ericvsmith, @alucab

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature', 'library', '3.9'] title = 'json.dumps() should encode float number NaN to null' updated_at = user = 'https://bugs.python.org/HaoyuSUN' ``` bugs.python.org fields: ```python activity = actor = 'mark.dickinson' assignee = 'none' closed = True closed_date = closer = 'mark.dickinson' components = ['Library (Lib)'] creation = creator = 'Haoyu SUN' dependencies = [] files = [] hgrepos = [] issue_num = 40633 keywords = [] message_count = 21.0 messages = ['368942', '368944', '368948', '368950', '368959', '369005', '369006', '369043', '369044', '369172', '369188', '369189', '369190', '369191', '369194', '381481', '381491', '383986', '384055', '384059', '384060'] nosy_count = 6.0 nosy_names = ['rhettinger', 'mark.dickinson', 'eric.smith', 'Haoyu SUN', 'arjanstaring', 'alucab'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue40633' versions = ['Python 3.9'] ```

10a1cbf5-763e-4aa8-bb14-e0296ea27213 commented 4 years ago

Float numbers in Python can have 3 special number: nan, inf, -inf, which are encoded by json module as "NaN", "Infinity", "-Infinity". These representations are not compatible with JSON specifications RFC7159: https://tools.ietf.org/html/rfc7159.html#page-6

These values are not correctly parsed by most JavaScript JSON encoders.

It is better to encode "NaN" to "null" which is a valid JSON keyword representing "Not a Number".

Here is an example how json.dumps() encodes NaN to NaN in JSON:
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> dct = {'a': None, 'b' : float('nan')}
>>> dct
{'a': None, 'b': nan}
>>> import json
>>> json.dumps(dct)
'{"a": null, "b": NaN}'

ericvsmith commented 4 years ago

Since this is documented behavior (https://docs.python.org/3.8/library/json.html#infinite-and-nan-number-values), we can't change it by default without breaking code.

What JavaScript JSON encoders and decoders specifically have a problem with this behavior? The documentation says "This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders.", so if there are encoders and decoders that it doesn't work with, that would be good to know.

10a1cbf5-763e-4aa8-bb14-e0296ea27213 commented 4 years ago

Thank you for the timely reply, Eric.

How about we add an optional argument (like the argument "ignore_nan" defaults to False as the package simplejson does) to functions like json.dumps(). So that user can choose whether he needs NaN encoded as NaN or null, meanwhile the default behavior stays the same.

In chromium based browsers, the function JSON.parse cannot parse it correctly. Here is an example below:

JSON.parse('{"a": null, "b": NaN}') uncaught SyntaxError: Unexpected token N in JSON at position 17 at JSON.parse (\<anonymous>) at \<anonymous>:1:6

ericvsmith commented 4 years ago

I think that's reasonable, although I could see someone objecting ("just use simplejson instead").

I suggest discussing this on the python-ideas mailing list and see what people think over there. It might help to create a PR first, if it's not a lot of work.

mdickinson commented 4 years ago

I don't think "null" in JSON is supposed to represent "Not a Number"; it's closer in meaning to Python's None. I definitely wouldn't want to see nans translated to "null" by default.

This also only seems to address a part of the issue: what's the proposed action for "Infinity" and "-Infinity"? We've written internal code to deal with float special values in JSON a few times (usually to work with databases that stick to the strict JSON definition), and that code has to find a way to deal with all three of the special values.

rhettinger commented 4 years ago

[Eric]

this is documented behavior

[Mark]

I definitely wouldn't want to see nans translated to "null" by default.

I concur with both of these statements.

I would support adding an option (off by default) to convert NaNs to None. While NaNs were originally intended to indicate an invalid value, they sometimes get used to denote missing values. In those situations, it would be reasonable to convert NaN to null.

rhettinger commented 4 years ago

One other issue just came to mind. While we could convert NaN to null during encoding, there isn't a reasonable way to reverse the process (a null could either be a NaN or a legitimate None). That would limit the utility of a new optional conversion.

10a1cbf5-763e-4aa8-bb14-e0296ea27213 commented 4 years ago

About using null in JSON to represnet NaN value of a float type, I prefer this logic: float is a numeric type that expecting a number as its value, "Not a Number" on a numeric type is equivalent to None (¬Number ∩ NumericValues = Empty). If we need to capture an error in calculation or input data, we can use the allow_nan option to catch it. Database connectors such as SQLAlchemy translate an empty field as float('nan') for a float number field. Probably we can safely take it as a convention. No idea yet for representing infinity.

Once encoded, there is no way to know a null originates from NaN or None without additional fields.

The direct conversion from Python data types to JSON may lose part of information due to JSON's limited data types. When converting a BMP image to GIF, we have to eliminate some colors to fit in the small pallet and we do not expect to restore the full information BMP image has from its GIF counterpart.

I suggest we make the JSON module have at least an option to generate standard-compliant JSON regardless potential loss of information, instead of leaving each application to have its subclass of JSONEncoder just for this corner case.

ericvsmith commented 4 years ago

I don't think we want to generate output no matter what. Should datetime instances become null instead of raising an exception?

Are there types other than float where some values are json serializable and others aren't?

rhettinger commented 4 years ago

We could add an option to cause NaNs to raise an error, but I don't think it would get used.

Otherwise, it's likely best to leave the module as-is.

mdickinson commented 4 years ago

We could add an option to cause NaNs to raise an error, but I don't think it would get used.

If that option were extended to also cause infinities to raise an error, then I'd use it. We have code that's producing JSON without knowing in advance exactly who the JSON consumer will be, and in particular whether the consumer will be strict in what it accepts or not. In that situation, it's preferable for us to discover that we're producing invalid JSON early (e.g., when running our own unit tests) rather than much later, when it turns out that the customer is using the "wrong" relational database.

mdickinson commented 4 years ago

... but I'm an idiot, since that option is already there (allow_nan=False), and I've just checked that we are in fact using it.

rhettinger commented 4 years ago

I missed that as well ;-)

Shall we close this now?

ericvsmith commented 4 years ago

I think it should be closed.

mdickinson commented 4 years ago

Agreed; closing.

bc0a3547-5cfd-4024-a7a2-428452134825 commented 3 years ago

Please re-evaluate; the current behaviour is incompatible with JSON specification in favour of providing the user/application/consumer of the resulted JSON information regarding the conversion process. Given what is stated in the documentation I do agree with the default behaviour, but I don't agree with only supporting "most JavaScript based encoders and decoders" and not supporting the JSON specification. I would opt to support "most encoders and decoders" + the JSON specification. Furthermore, the allow_nan doesn't allow anything, it forces as no alternative is provided. Setting it to false does not make it disallow, but makes it not work at all, forcing to use the default behaviour. I would suggest when allow_nan is set to false, to make it compliant with JSON and use null instead (as per specification). This way we are supporting most Javascript based encoders and decoders, but can also produce JSON compliant output.

mdickinson commented 3 years ago

@Arjan Staring: could you point to which part of the JSON specification you're looking at?

At https://tools.ietf.org/html/rfc7159, the only reference to NaNs that I see is:

Numeric values that cannot be represented in the grammar below (such as Infinity and NaN) are not permitted.

At https://www.json.org/json-en.html, there's no mention of IEEE 754 special values.

I'm not seeing anything anywhere to suggest that the JSON specification says NaNs should be translated to nulls.

15e9ca4a-6328-4c6b-8c2b-775639d1aed9 commented 3 years ago

I agree with arjanstaring

This implementation is not standard compliant and breaks interoperability with every ECMA compliant Javascript deserializer.

Technically is awful of course but interoperability and standardization come before than technical cleanliness IMHO

Regarding standardization:

If you consider https://tools.ietf.org/html/rfc7159

there is no way to represent the literal "nan" with the grammar supplied in section 6 hence the Infinity and Nan values are forbidden so as "nan"

For interoperability

If you consider http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

It is clearly stated in section 24.5.2 Note 4 that JSON.stringify produces null for Infinity and NaN

"Finite numbers are stringified as if by calling ToString(number). NaN and Infinity regardless of sign are represented as the String null"

It is clearly stated in section 24.5.1 that JSON.parse uses eval-like parsing as a reference for decoding. nan is not an allowed keyword at all. For interoperability NaN could be used but out from the JSON standard.

So what happens is that this will break all the ECMA compliant parsers (aka browsers) in the world. Which is what is happening to my project by the way

Pandas serialization methos (to_json) already adjusts this issue, but I really think the standard should too

mdickinson commented 3 years ago

@Luca: you might want to open a new feature request issue; it's not clear to me what exact behaviour change you're proposing for Python.

What was rejected in this issue was the proposal to *automatically* convert NaNs and infinities to nulls by default, but that still leaves open the possibility of adding an option to do such conversion, provided that a sufficiently strong case could be made for adding such an option, and that we can figure out what we want the behaviour should be (should _all_ things that JSON doesn't know how to encode be converted to null, or just infinities and nans?)

If you want standards compliance, then that's already there: you can use the existing flag allow_nan=False when generating JSON. I agree that it would have been better if that were the default, but changing it now is probably a no-go - it would break too much existing code.

I'm still confused by Arjan Staring's comments: they seem to be saying that the JSON specification states that a NaN should be converted to the string "null", but there's nothing in RFC 7159 to support that - as you point out, it explicitly says that NaNs and infinities are disallowed.

mdickinson commented 3 years ago

For the record, some helpful resources:

ECMA-404 (the ECMA standardization of JSON): http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf

RFC 8259 (current RFC for JSON): https://tools.ietf.org/html/rfc7159. (I mistakenly referred to RFC 7159 in a previous comment, but that's obsoleted by RFC 8259; however, none of the language around infinities and nans has changed, and none of the current errata to RFC 8259 have any impact on infinity or nan encoding.)

This Stack Overflow question and its answers contain some interesting discussion and links: https://stackoverflow.com/questions/1423081/json-left-out-infinity-and-nan-json-status-in-ecmascript

Essentially, there's no good answer here: standard JSON simply can't encode infinities and NaNs. Absent a fix for the standard itself, both Python and ECMAScript end up papering over that fact. Unfortunately from an interoperability point of view, they do so in different ways - Python effectively extends the JSON spec in such a way that it produces invalid JSON by default; ECMAScript converts all of Infinity, -Infinity, NaN and null to the exact same JSON string, producing valid JSON but losing the ability to restore the original values from their JSON representations.

FWIW, Python's solution to this problem is (whether by accident or design I'm not sure) forward-looking in the sense that it's compatible with JSON 5: https://spec.json5.org

mdickinson commented 3 years ago

RFC 8259 (current RFC for JSON): https://tools.ietf.org/html/rfc7159

Argh; copy-and-paste fail. That link should have been https://tools.ietf.org/html/rfc8259, of course.

Dzeri96 commented 2 years ago

Sorry to continue a closed discussion but I think adding an option like nonfinite_to_null would solve a lot of headaches. It's clear we're in a situation where some functionality has to be sacrificed either way, but choosing not to break every browser in the market warrants implementation of this flag IMO.

Edit: This PR seems like it might fix the issue, though I'm not crazy about the naming of the new parameter.

mdickinson commented 2 years ago

@Dzeri96

Sorry to continue a closed discussion [...]

It may be worth opening a new issue, so that the discussion doesn't get lost / hidden. Or I guess we could repurpose and re-open this one, but that doesn't seem like good issue management. (The request on this issue was for JSON encoding to turn NaNs and infinities to null by default. That's not viable, both for backwards compatibility breakage reasons and because the result would still not be compliant with JSON.)

This https://github.com/python/cpython/pull/13233 seems like it might fix the issue

A general mechanism for overriding the float encoding behaviour (or perhaps just the encoding behaviour for nans and infinities) does seem preferable to another boolean flag. That would then support custom uses like "raise on infinities, but encode nans to null", which might fit a lot of scientific data where missing values (represented by nans) are normal and expected but infinities are not.

mdickinson commented 2 years ago

A general mechanism for overriding the float encoding behaviour

Maybe repurpose allow_nan itself, extending it to allow a callable to be passed?

Dzeri96 commented 2 years ago

Maybe repurpose allow_nan itself, extending it to allow a callable to be passed?

That would be a good solution, tough the name of the argument is somewhat limiting unfortunately. I'll just create a new issue with some alternatives so we can discuss them properly. This issue will be referenced by the new one.

dennisvang commented 1 year ago

python / cpython

json.dumps() should encode float number NaN to null #84813