python / cpython

The Python programming language
https://www.python.org
Other
63.21k stars 30.27k forks source link

json.dumps to check for obj.__json__ before raising TypeError #71549

Open 9e586acb-6fbb-4d42-913a-7e3addd17bc0 opened 8 years ago

9e586acb-6fbb-4d42-913a-7e3addd17bc0 commented 8 years ago
BPO 27362
Nosy @rhettinger, @etrepum, @stevendaprano, @bitdancer, @berkerpeksag, @serhiy-storchaka, @Vgr255
Files
  • json-customize.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/etrepum' closed_at = None created_at = labels = ['3.7', 'type-feature', 'library'] title = 'json.dumps to check for obj.__json__ before raising TypeError' updated_at = user = 'https://bugs.python.org/DanielWard' ``` bugs.python.org fields: ```python activity = actor = 'ulope' assignee = 'bob.ippolito' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Daniel Ward' dependencies = [] files = ['46701'] hgrepos = [] issue_num = 27362 keywords = ['patch'] message_count = 11.0 messages = ['268991', '268993', '268995', '268999', '269002', '269003', '269021', '269091', '289001', '289006', '289007'] nosy_count = 10.0 nosy_names = ['rhettinger', 'bob.ippolito', 'steven.daprano', 'r.david.murray', 'berker.peksag', 'serhiy.storchaka', 'abarry', 'ulope', 'Daniel Ward', 'Ollie Ford'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue27362' versions = ['Python 3.7'] ```

    9e586acb-6fbb-4d42-913a-7e3addd17bc0 commented 8 years ago

    To help prevent retrospective JSONEncoder overrides when failing to serialize a given object, the intention of this issue is to propose that the JSON encoder checks if a given object has a __json__ attribute, using that rather than raising a TypeError.

    This will help in maintaining easier-to-follow code and keeps the responsibility of determining how an object should be represented in JSON objects firmly within the object itself.

    The obj.__json callable/attribute should behave in the same way as __repr or __str__, for example.

    I'm happy to look in to contributing this enhancement myself if that's preferred. Any pointers as to how I go about contributing are greatly appreciated.

    cd293b3e-6d38-412b-8370-a46a9aaee518 commented 8 years ago

    I'm not too familiar with the json package, but what should __json__ return when called?

    9e586acb-6fbb-4d42-913a-7e3addd17bc0 commented 8 years ago

    Sure, so for example:

    \=========

    import json
    
    class ObjectCounter:
    
        def __init__(self, name, count):
            self.name = name
            self.count = count
    
        def __json__(self):
           return '[{name}] {count}'.format(name=self.name, count=self.count)
    
    object_counter = ObjectCounter('DC1', 3789)
    my_json_string = json.dumps({'success': True, 'counter': object_counter})

    ============

    In the above example, the value stored in my_json_string would be:

    '{"success": true, "counter": "[DC1] 3789"}'

    This is an untested and quick example, but I hope it explains what I'm aiming to achieve. Without the __json__ method, the json.dumps call would raise an exception along the lines of the below message, unless we create a new JSONEncoder object and call json.dumps(..., cls=MyJSONEncoder), which becomes difficult to manage and follow on larger projects.

    TypeError: \<ObjectCounter instance at XXX> is not JSON serializable

    cd293b3e-6d38-412b-8370-a46a9aaee518 commented 8 years ago

    So __json__ returns a string meant to be serializable. I'm not too keen on using a dunder name (although my word doesn't weigh anything ;) and I'd personally prefer something like as_json_string(). I think the idea in general is good, though. Mind submitting a patch?

    stevendaprano commented 8 years ago

    For starters, dunder names like __json__ are reserved for Python's own use, so you would have to get the core developers to officially bless this use.

    But... I'm not really sure that "the responsibility of determining how an object should be represented in JSON objects firmly within the object itself" is a good idea. For a general purpose protocol, I don't think you can trust any object to return valid JSON. What if my object.__json returned "}key='c" or some other invalid string? Whose responsibility is it to check that __json returns valid JSON?

    I don't think there is any need to make this an official protocol. You know your own objects, you know if you can trust them, and you can call any method you like. So your example becomes:

        my_json_string = json.dumps(
            {'success': True, 'counter': object_counter.to_json()})

    which is okay because that's clearly *your* responsibility to make sure that your object's to_json method returns a valid string. If you make it an official language wide protocol, it's unclear whose responsibility it is: the object (dangerous!), the caller (difficult), the Python interpreter (unlikely), json.dumps (unlikely).

    9e586acb-6fbb-4d42-913a-7e3addd17bc0 commented 8 years ago

    I don't think I explained the response very well, effectively the __json call would return an object which is JSON-serializable. This would include dict objects containing JSON-serializable objects albeit natively-supporting JSON serialisation or by means of subsequent obj.__json calls.

    The reason I gave it __json__ is purely for easily-remembered implementation, separating it out from calls which may potentially clash with existing codebases, because let's face it, people don't often get to start again ;)

    I'm not adverse to changing the method name at all, but I do believe this is a progressive way to go regarding JSON-serialization.

    berkerpeksag commented 8 years ago

    This was discussed on python-ideas before:

    I don't think there was an agreement on the idea so I suggest to send your proposal to python-ideas first.

    bitdancer commented 8 years ago

    Pretty much any project that makes non-trivial use of json ends up implementing a jsonification protocol, usually by creating either a __json__ method or (more commonly, I think) a to_json method.

    But, yeah, this is python-ideas material and would get into the stdlib only as an officially blessed protocol, in which case using __json__ would make sense. So I'm going to close the issue pending a consensus on python-ideas. If it gets accepted the issue can be reopened.

    serhiy-storchaka commented 7 years ago

    This could fix other issues:

    Currently the blessed way of JSON encoder customization is to implement the default method in JSONEncoder subclass or pass the default argument to dump(). But that requires changing every JSON serialization call and handling all non-standard types in one function. I think it would be handly to pick the type-specific serialization function from: 1) per-encoder dispatch table, 2) global dispatch type (registry), 3) __json__ method. This can be done after the default function fails or be included in the default default method.

    This will add JSON support of standard library types (e.g. collections other than list, tuple and dict, numbers other than int and float) and will help to implement task specific serialization of user classes.

    rhettinger commented 7 years ago

    I concur with David Murray that this should be kicked around on python-dev or python-ideas first. Also, we should ask Bob Ippolito for his thoughts.

    serhiy-storchaka commented 7 years ago

    This feature already was proposed for simplejson (https://github.com/simplejson/simplejson/issues/52). Special __json__ method is used in wild in a number of projects for exactly this purpose. It looks to me the main disagreement in the past Python-Idea discussion (https://mail.python.org/pipermail/python-ideas/2010-July/007811.html) was about whether implement the customization as a special method or as a registry. I suggest to implement both. Special methods are good for standard collection and numeric classes, global registry is good for application-wide serialization, local dispatch table or the default method are good for more specific task-specific customization.

    Here is a draft implementation. It follows the design of pickle and copy modules.

    There are few design questions.

    1. What is the order of using different customization methods? Should registries and __json__ be checked before calling the default method, after calling the default method (if it fails), or inside the default implementation of the default method?

    2. For Decimal we need to customize raw JSON representation. In the past it was possible to implement an intermediate float or int subclass with __str or __repr returning raw JSON representation. But this hack no longer works. Needed to add explicit support of special JSON representation objects. Other way -- add yet one special method (raw_json_ or \_json_str).

    3. Do we need the json.registry() function for global registration, or it is enough to expose the json.dispatch_table mapping?