python / cpython

The Python programming language
https://www.python.org
Other
63.22k stars 30.28k forks source link

json encoder unable to handle decimal #60739

Open merwok opened 11 years ago

merwok commented 11 years ago
BPO 16535
Nosy @rhettinger, @jcea, @etrepum, @mdickinson, @pitrou, @tiran, @ezio-melotti, @merwok, @serhiy-storchaka, @lesinigo
Files
  • json_decimal.patch: Patch file generated via "hg diff"
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'library', '3.10'] title = 'json encoder unable to handle decimal' updated_at = user = 'https://github.com/merwok' ``` bugs.python.org fields: ```python activity = actor = 'luca.lesinigo' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'eric.araujo' dependencies = [] files = ['30836'] hgrepos = [] issue_num = 16535 keywords = ['patch'] message_count = 16.0 messages = ['176135', '176136', '176158', '176161', '176398', '179869', '192520', '213381', '214733', '214737', '224032', '224096', '274026', '289010', '349250', '383559'] nosy_count = 16.0 nosy_names = ['rhettinger', 'jcea', 'bob.ippolito', 'mark.dickinson', 'pitrou', 'christian.heimes', 'ezio.melotti', 'eric.araujo', 'Arfrever', 'zzzeek', 'cvrebert', 'serhiy.storchaka', 'ralhei', 'mjensen', 'risa2000', 'luca.lesinigo'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue16535' versions = ['Python 3.10'] ```

    merwok commented 11 years ago

    In 2.7 and other versions, the json module has incomplete support for decimals:

    >>> json.loads('0.2', parse_float=Decimal)
    Decimal('0.2')
    >>> json.dumps(json.loads('0.2', parse_float=Decimal))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
        return _default_encoder.encode(obj)
      File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
        chunks = self.iterencode(o, _one_shot=True)
      File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
        return _iterencode(o, 0)
      File "/usr/lib/python2.7/json/encoder.py", line 178, in default
        raise TypeError(repr(o) + " is not JSON serializable")
    TypeError: Decimal('0.2') is not JSON serializable

    simplejson encodes decimals out of the box, but json can’t round-trip.

    merwok commented 11 years ago

    See lengthy discussion that lead to inclusion in simplejson here: http://code.google.com/p/simplejson/issues/detail?id=34

    serhiy-storchaka commented 11 years ago

    The json module already has too many options. No need for yet one such specialized.

    >>> class number_str(float):
    ...     def __init__(self, o):
    ...         self.o = o
    ...     def __repr__(self):
    ...         return str(self.o)
    ... 
    >>> def decimal_serializer(o):
    ...     if isinstance(o, decimal.Decimal):
    ...         return number_str(o)
    ...     raise TypeError(repr(o) + " is not JSON serializable")
    ... 
    >>> print(json.dumps([decimal.Decimal('0.20000000000000001')], default=decimal_serializer))
    [0.20000000000000001]

    You can extend this to support complex numbers, fractions, date and time, and many other custom types. Have specialized options for this would be cumbersome.

    mdickinson commented 11 years ago

    Judging by the discussion that Éric points to, and by the various stackoverflow questions on the topic ([1], [2]), this is a common enough need that I think it would make sense to have some support for it in the std. lib.

    There's a sense in which Decimal is the 'right' type for json, and we shouldn't make it harder for people to do the right thing with respect to (e.g.) financial data in databases.

    [1] http://stackoverflow.com/questions/4019856/decimal-to-json [2] http://stackoverflow.com/questions/1960516/python-json-serialize-a-decimal-object

    merwok commented 11 years ago

    Thanks for the workaround Serhiy, I stole that and ran with it :)

    For 3.4 I still think something built-in would be best.

    pitrou commented 11 years ago

    Decimal numbers should simply be serializable by default. It doesn't make sense to add a specialized option.

    29ed28e0-3942-4edb-b93e-2f83e367661d commented 11 years ago

    This patch was implemented on Europython 2013 sprint. It's my first addition to Python core ever so please bear with me if it's not perfect.

    Decimal support is implemented both in the C and Python JSON code.

    There is one peculiarity to mention about the Decimal addition in function _json.c:encoder_listencode_obj() of my patch: The addition of

        else if (PyObject_IsInstance(obj, (PyObject*)PyDecimalType)) {
            PyObject *encoded = encoder_encode_decimal(s, obj);
            if (encoded == NULL)
                return -1;
            return _steal_accumulate(acc, encoded);
        }

    was has to be located AFTER lists and dicts are handled in the JSON encoder, otherwise the unittest "test_highly_nested_objects_encoding()" from test_recursion.py fails with a nasty, unrecoverable Python exception. My guess is that this is due additional stack allocation when the stack space is almost used up by the deeply nested recursion code.

    merwok commented 10 years ago

    Patch looks good and contains tests for the C and Python code.

    Documentation is missing (a note to tell that json.dump converts decimal.Decimal instances to JSON numbers, a versionchanged directive, maybe a link to the doc that explains parse_float=decimal.Decimal).

    pitrou commented 10 years ago

    The patch isn't really ok, IMO. It forcibly imports the decimal module and then looks up the type there. The decimal module is a rather large one and it shouldn't get imported if it doesn't get used.

    I think it would be better to rely on the __float__ special method, which would also automatically accept other numberish types such as Fraction.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 10 years ago

    I think we should really apply bpo-19232. At least that would take care of the import issue.

    tiran commented 10 years ago

    I'm EuroPython 2014 in Berlin. Ralph has approached me and asked me about progress on the progress of this patch. I'm reluctant to implement a special case for decimals for two reasons:

    1) JSON just support floats and decimals are IMHO incompatible with floats. The conversion of decial to JSON floats is a loosely operation.

    2) Rather than having a special case I would rather go with a general implementation that uses an ABC to JSON dump some float-like objects.

    77411a08-770c-471e-ba30-9528530a8d45 commented 10 years ago

    1) JSON just support floats

    If you read the JSON standards documents, you'll see that this isn't accurate.

    Regardless, a general solution for non-built-in numeric types does seem preferable.

    1207b81f-242b-4018-bc9e-ab33a2584c56 commented 8 years ago

    Hi @cvrebert and team - do you know if this was ever implemented. It seems that it is still an issue for financial applications, and that the solution proposed would be relevant and helpful.

    serhiy-storchaka commented 7 years ago

    The trick from msg176158 no longer works since bpo-26719.

    0c3fbe6d-f576-44b0-9dc2-136bde14084e commented 5 years ago

    It looks like I am resurrecting an old item, but I have been just hit by this and was directed to this issue (https://mail.python.org/archives/list/python-ideas@python.org/thread/WT6Z6YJDEZXKQ6OQLGAPB3OZ4OHCTPDU/)

    I wonder if adding something similar to what simplejson uses (i.e. explicitly specifying in json.dump(s) how to serialize decimal.Decimal) could be acceptable.

    Or, the other idea would be to expose a method in JSONEncoder, which would accept "raw" textual output, i.e. string (or even bytes) and would encode it without adding additional characters to it. (as explained in my posts in the other threads).

    As it seems right now, there is no way to serialize decimal.Decimal the same way it is deserialized, i.e. while preserving the (arbitrary) precision.

    rhettinger commented 3 years ago

    I wonder if adding something similar to what simplejson uses (i.e. explicitly specifying in json.dump(s) how to serialize decimal.Decimal) could be acceptable.

    +1 for this approach. For financial applications, we need the recommended solution to be simple.

    nineteendo commented 2 months ago

    Sadly the C implementation will have to call Python to handle sNaN and NaN payloads properly:

    def encode_decimal(decimal: Decimal) -> str:
        if not decimal.is_finite():
            if decimal.is_snan():
                msg: str = f"{decimal!r} is not JSON serializable"
                raise ValueError(msg)
    
            if not allow_nan_and_infinity:
                msg = f"{decimal!r} is not allowed"
                raise ValueError(msg)
    
            if decimal.is_qnan():
                return "NaN"
    
        return decimal_str(decimal)
    joooeey commented 2 months ago

    I wonder if adding something similar to what simplejson uses (i.e. explicitly specifying in json.dump(s) how to serialize decimal.Decimal) could be acceptable.

    +1 for this approach. For financial applications, we need the recommended solution to be simple.

    Indeed, the solution should be simple. But the simplest solution is not the one quoted above but one mentioned earlier:

    Decimal numbers should simply be serializable by default. It doesn't make sense to add a specialized option.

    There is no need to explicitly specify how to serialize decimal.Decimal because there is only one obvious way to do it (str(decimal) plus some handling for a few corner cases which should be obvious too). Hence, the only required API change would be to add decimal.Decimal to the list of types that json.JSONEncoder accepts. Of course, someone would have to write that implementation.

    joooeey commented 2 months ago

    Sadly the C implementation will have to call Python to handle sNaN and NaN payloads properly:

    def encode_decimal(decimal: Decimal) -> str:
        if not decimal.is_finite():
            if decimal.is_snan():
                msg: str = f"{decimal!r} is not JSON serializable"
                raise ValueError(msg)
    
            if not allow_nan_and_infinity:
                msg = f"{decimal!r} is not allowed"
                raise ValueError(msg)
    
            if decimal.is_qnan():
                return "NaN"
    
        return decimal_str(decimal)

    I notice that this code appears equivalent to the floatstr function defined inside json.encoder.JSONEncoder.iterencode if you pass _repr=decimal.Decimal.__str__. The only difference is a less clear error for signalling NaNs (InvalidOperation: [<class 'decimal.InvalidOperation'>]).

    joooeey commented 2 months ago

    simplejson encodes decimals out of the box, but json can’t round-trip.

    simplejson does not properly support non-numeric decimals like NaN, sNaN, Infinity or -Infinity. These are not valid JSON and should raise an error, especially if allow_nan == False. But simplejson.dumps(Decimal('nan'), allow_nan=False) happily succeeds.

    So people wanting to properly write IEEE 754 decimal numbers to JSON (as JSON number), are out of luck with Python. There is no working solution in the standard library, no workaround recipe (since Python 3.5) and no external library that can do it correctly. EDIT: As @nineteendo points out, the jsonyx package treats Decimal correctly. But it's not available via conda.

    nineteendo commented 2 months ago

    no external library that can do it correctly

    Did you try https://pypi.org/project/jsonyx? It's fully JSON compliant under default settings.

    joooeey commented 2 months ago

    @nineteendo I just tried it. Appears to work flawlessly (No issues with a 76-digit number, NaN, -Infinity or -0). It also doesn't do funny things with non-string keys like json in the standard library.

    nineteendo commented 2 months ago

    Glad you like it. It also raises errors for

    But it's not available via conda.

    Can you create an issue? Explaining how I need to publish it there.

    joooeey commented 2 months ago

    Can you create an issue? Explaining how I need to publish it there.

    Will do. However, I've only looked into it but never tried it, so I can't help with experience.

    nineteendo commented 1 month ago

    OK, so it's possible to implement this in C:

    static PyObject *
    encoder_encode_decimal(PyEncoderObject *s, PyObject *obj)
    {
        /* Return the JSON representation of a Decimal. */
        PyObject *is_finite = PyObject_CallMethod(obj, "is_finite", NULL);
        if (is_finite == NULL) {
            return NULL;
        }
    
        if (!PyObject_IsTrue(is_finite)) {
            Py_DECREF(is_finite);
            PyObject *is_snan = PyObject_CallMethod(obj, "is_snan", NULL);
            if (is_snan == NULL) {
                return NULL;
            }
    
            if (PyObject_IsTrue(is_snan)) {
                Py_DECREF(is_snan);
                PyErr_Format(PyExc_ValueError, "%R is not JSON serializable", obj);
                return NULL;
            }
    
            Py_DECREF(is_snan);
            if (!s->allow_nan_and_infinity) {
                PyErr_Format(PyExc_ValueError, "%R is not allowed", obj);
                return NULL;
            }
    
            PyObject *is_qnan = PyObject_CallMethod(obj, "is_qnan", NULL);
            if (is_qnan == NULL) {
                return NULL;
            }
    
            if (PyObject_IsTrue(is_qnan)) {
                Py_DECREF(is_qnan);
                return PyUnicode_FromString("NaN");
            }
    
            Py_DECREF(is_qnan);
        }
    
        Py_DECREF(is_finite);
        return ((PyTypeObject *)s->Decimal)->tp_str(obj);
    }