python / cpython

The Python programming language
https://www.python.org
Other
63.15k stars 30.24k forks source link

json.dump() ignores its 'default' option when serializing dictionary keys #63020

Open 64fe4661-1a60-4325-9464-32eca8026b44 opened 11 years ago

64fe4661-1a60-4325-9464-32eca8026b44 commented 11 years ago
BPO 18820
Nosy @rhettinger, @etrepum, @pitrou, @ezio-melotti, @tifv, @berkerpeksag, @serhiy-storchaka, @iritkatriel
Files
  • json-default-dict-keys.diff: applies to default, does not fix C implementation
  • json-default-tests.diff: proposed tests (for default)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library', '3.9', '3.10', '3.11'] title = "json.dump() ignores its 'default' option when serializing dictionary keys" updated_at = user = 'https://github.com/tifv' ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'july' dependencies = [] files = ['31437', '31450'] hgrepos = [] issue_num = 18820 keywords = ['patch'] message_count = 8.0 messages = ['195957', '195958', '196022', '196036', '263013', '263019', '263025', '415093'] nosy_count = 11.0 nosy_names = ['rhettinger', 'bob.ippolito', 'pitrou', 'ezio.melotti', 'cvrebert', 'july', 'docs@python', 'berker.peksag', 'serhiy.storchaka', 'Erwin Mayer', 'iritkatriel'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue18820' versions = ['Python 3.9', 'Python 3.10', 'Python 3.11'] ```

    Linked PRs

    64fe4661-1a60-4325-9464-32eca8026b44 commented 11 years ago

    According to documentation of json.dump(), concerning its 'default' option:

    default(obj) is a function that should return a serializable version of obj or raise TypeError. The default simply raises TypeError.

    But this function is actually never applied to serialized dictionary keys:

    >>> def default(obj):
    ...  if isinstance(obj, bytes):
    ...   return obj.decode('ascii')
    ...  raise ValueError
    ... 
    >>> json.dumps(b'asdf')
    Traceback (most recent call last):
    ...
    TypeError: b'asdf' is not JSON serializable
    >>> json.dumps(b'asdf', default=default)
    '"asdf"'
    >>> json.dumps({b'asdf' : 1}, default=default)
    Traceback (most recent call last):
    ...
    TypeError: keys must be a string
    >>> json.dumps({1 : b'asdf'}, default=default)
    '{"1": "asdf"}'

    (bytes are used purely for the purpose of example) Such behavior should be either documented or corrected. Patch correcting python implementation of json attached.

    64fe4661-1a60-4325-9464-32eca8026b44 commented 11 years ago

    Oops, my patch disables 'skipkeys' argument of dump. Another version attached.

    ezio-melotti commented 11 years ago

    Are there already tests that cover the changes in your patch? If not, could you add them?

    64fe4661-1a60-4325-9464-32eca8026b44 commented 11 years ago

    Proposed tests attached.

    1305e495-fc8a-463a-92c1-955179a3b6c4 commented 8 years ago

    Will this be merged? I also believe it is an unexpected behavior to not serialize dictionary keys when the default option is used.

    serhiy-storchaka commented 8 years ago

    See also

    https://github.com/simplejson/simplejson/issues/42 https://github.com/simplejson/simplejson/issues/100 https://github.com/simplejson/simplejson/issues/103

    And allowing non-string keys leads to other issue:

    https://github.com/simplejson/simplejson/issues/77

    Thus there is an argument for disallowing serializing non-string keys.

    1305e495-fc8a-463a-92c1-955179a3b6c4 commented 8 years ago

    Regarding the issues mentioned in https://github.com/simplejson/simplejson/issues/77, they already apply with the current implementation anyway (true is serialized as 'true'), so users must already be careful.

    The JSONEncoder with default parameters could definitely keep rejecting complex keys, but an optional 'encode_complexkeys=False' parameter could be added to \_init__ to provide this functionality. There would be zero performance impact as the parameter check would only be done if all else has failed instead of raising the TypeError (the same way _skipkeys does).

    In that respect, the patch of this issue would need to be amended (and the C version would also need to be changed).

    I am trying to fork the json core module to achieve this (to not have to wait for the next Python release, assuming it gets implemented), if you are familiar with the packaging process of core modules into standalone modules, your advice would be much appreciated (and could well help others contribute to Python): http://stackoverflow.com/questions/36498527/how-to-create-a-customized-version-of-a-core-python-module-that-includes-c-code

    iritkatriel commented 2 years ago

    Reproduced on 3.11.

    h-mayorquin commented 1 year ago

    Is still the view of the developers the one described in https://github.com/simplejson/simplejson/issues/100? (That is, that non-strings keys in dictionaries should be disavowed for ambiguity concerns).

    Would a patch to document this exception in the documentation would be welcomed?

    aeisenbarth commented 1 year ago

    I understand the concerns regarding default behavior. But the restriction of string-only keys is on the JSON side, whereas complex object keys are perfectly valid in Python dictionaries. That is the very reason why we would need an encoder.

    Probably the encoder should not automatically try to stringify everything it encounters (as the example of 1 and "1"), but if a user explicitly provides a specific custom type in default, the intention seems quite clear and we should assume the user is aware of and responsible for the consequences?