python / cpython

The Python programming language
https://www.python.org
Other
62.28k stars 29.93k forks source link

json C vs pure-python implementation difference #59091

Open 01e27b45-90f2-4c74-9e5e-7e7e54c3d78e opened 12 years ago

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 12 years ago
BPO 14886
Nosy @pitrou, @scoder, @ezio-melotti, @socketpair, @matrixise

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', '3.8', 'type-bug', 'library'] title = 'json C vs pure-python implementation difference' updated_at = user = 'https://github.com/socketpair' ``` bugs.python.org fields: ```python activity = actor = 'scoder' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'socketpair' dependencies = [] files = [] hgrepos = [] issue_num = 14886 keywords = [] message_count = 10.0 messages = ['161395', '161399', '161474', '161496', '161536', '161538', '161540', '171409', '323649', '323654'] nosy_count = 8.0 nosy_names = ['pitrou', 'scoder', 'thomaslee', 'ezio.melotti', 'cvrebert', 'socketpair', 'alexey-smirnov', 'matrixise'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue14886' versions = ['Python 3.7', 'Python 3.8'] ```

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 12 years ago

Pure-python implementation: if isinstance(o, (list, tuple)):

C implementation: if (PyList_Check(obj) || PyTuple_Check(obj))

This make real difference (!) in my code.

So, please change pure-python implementation to: if type(o) in (list, tuple): Or, fix C implementation to: / intentionally forgot (only for this example) to check if return value is -1 \/ if (PyObject_IsInstance(obj, PyList_Type) || PyObject_IsInstance(obj, PyTuple_Type)

pitrou commented 12 years ago

What difference does it make? Are you using __instancecheck__ perhaps?

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 12 years ago

!/usr/bin/python2.7

import json

class pseudo_list(object):
    __class__ = list # fake isinstance

    def __init__(self, iterator):
        self._saved_iterator = iterator

    def __iter__(self):
        return self._saved_iterator

class myenc(json.JSONEncoder):
    def default(self, o):
        try:
            return pseudo_list(iter(o))
        except TypeError:
            return super(myenc, self).default(o)

# works (pure-python implementation) print json.dumps({1:xrange(10), 2:[5,6,7,8]}, cls=myenc, indent=1)

# does not work (C implementation) print json.dumps({1:xrange(10), 2:[5,6,7,8]}, cls=myenc, indent=None)

pitrou commented 12 years ago

class pseudolist(object): \_class__ = list # fake isinstance

Why not inherit from list directly? Setting __class__ to something else isn't widely supported in the Python code base. It may work or may not work, depending on the API, but it's not something we design or test for.

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 12 years ago

Well, __class_ = list is my problem, but python's problem is that it uses different approaches in C and python implementation.

P.S. I don't want to subclass list, as I don't want things like this: x = pseudo_list(iter(xrange(10)) x.append('test') print len(x)

pitrou commented 12 years ago

Well, __class_ = list is my problem, but python's problem is that it uses different approaches in C and python implementation.

Well, by construction a C accelerator will use the fastest method available within what the API's specification allows. The json API doesn't specify whether isinstance() or a more concrete type check is used when dispatching over argument types, so I'd classify this as an implementation detail.

01e27b45-90f2-4c74-9e5e-7e7e54c3d78e commented 12 years ago

Inconsistency is bother me. If I specify indent in dumps(), I will have one semantics, else other ones.

Why not to fix pure-python implementation using "type(o) in (list, tuple)" ? This is faster too (as I think).

2ba56af2-62ea-4996-b0d1-725349dcf07d commented 11 years ago

FWIW, I think Mark's right here. I'm +1 on the implementations being consistent.

Seems like a potentially nasty surprise if you move from one implementation to the other and, lacking awareness of this quirk, design your algorithm around semantics. I think this was Mark's original point.

If the json API doesn't care how the type check is performed, then we get a (probably very small :)) win from the type(o) in (list, tuple) for the Python impl in addition to bringing consistency to the two implementations.

matrixise commented 6 years ago

We have received a notification about this bug for 3.5

scoder commented 6 years ago

FWIW, the C implementation of the sequence encoder uses PySequence_Fast(), so adding a lower priority instance check that calls the same encoding function would solve this.

https://github.com/python/cpython/blob/cfa797c0681b7fef47cf93955fd06b54ddd09a7f/Modules/_json.c#L1730

Probably not something to fix in Py3.5/6 anymore, though.