zopefoundation / zodbpickle

Fork of Python's pickle module to work with ZODB
Other
17 stars 15 forks source link

Status of 'noload' in Python 3 stdlib ? (i.e. let's try to kill this module for Python 3) #35

Open jmuchemb opened 6 years ago

jmuchemb commented 6 years ago

zodbpickle has recently entered Debian. And since it forks code from Python, it led to the following discussion: https://lists.debian.org/debian-security-tracker/2018/04/msg00021.html

In particular, I discovered that https://bugs.python.org/issue6784 is fixed and if we manage to get back the noload operation in Python 3 (it existed in Python 2), we could stop forking the stdlib modules.

  1. About issue 6784, there's apparently only a small difference between upstream and zodbpickle. Upstream does not have errors='bytes' (used in ZODB._compat), but I guess we can achieve the same result with:
    try:
    return loads(s)
    except UnicodeDecodeError:
    return loads(s, encoding='bytes')
  2. What's the status 'noload' in Python 3 stdlib ? I could not find anything in bugs.python.org.
jamadden commented 6 years ago

As I understand it, ‘noload’ is dead in the python stdlib. However, that is a core requirement of ZODB and hence this library. (If you haven’t tried to GC a large database in an environment without ‘noload’, be prepared to wait. A lot )

jmuchemb commented 6 years ago

Well, I know all that. But there isn't any mention of the removal in the doc or the source code. In the Git history, cPickle.c was removed in 2007, and _pickle.c was added in 2008 (as part of https://bugs.python.org/issue2917), with useless commit messages.

Have we at least asked to get it back in Python 3 ?

jamadden commented 6 years ago

Even in Python 2.7, noload is broken for ZODB. The code here uses its Python 2.6 behaviour (see #9), but that has Python bug 1101399. That bug doesn't matter for the ZODB use case, but it makes noload less generic, which is a concern for the stdlib.

Given that Python 2.7 has a noload, but it's broken for ZODB (and the commit that broke it refers to it as an "undocumented, obscure function" and didn't bother to add a test case), and Python 3.x has never had a noload, I would imagine the barrier to get it added to Python 3 would be fairly high. It would have to fix the same bug that 2.7 did but without breaking in the more general case (as well as be documented and tested, presumably).

The window for feature additions to Python 3.7 has closed, so at the earliest such an addition would be part of is 3.8 and we would need this library as long as ZODB supported any versions < 3.8.

I'm not saying it can't happen, but it sounds like a lot of work 😄 I would guess that a starting point would be posting to python-ideas.

jmuchemb commented 6 years ago

Thank you. I didn't know that the status was already so bad for Python 2.7.

In any case, we should also have a clear view of what we change, which is not easy when our code is based on old versions of the stdlib. A first step would be to sanitize our code, by having an upstream branch containing unmodified code of pickle from Python 3, and merge it from time to time to get new features and bugfixes. This way:

vstinner commented 2 years ago

Recent discussion: https://github.com/zopefoundation/zodbpickle/pull/64#issuecomment-1025732450

tseaver commented 4 months ago

My current plan is to have three versions of the stdlib _python.c fork in zodbpickle:

This change will also need to re-fork the pickletools modules, to account for the new protocol support, and involves adjusting a bunch of tests.

I'm most of the way through the changes to the 3.8 version in C code, but wanted to get some discussion going on the approach before I go much further.

icemac commented 4 months ago

See also https://github.com/zopefoundation/zodbpickle/pull/92 and the comment I made there about probably no longer needing _pickle_33.c.

@tseaver Thank you for investing time into updating this library to recent changes in the pickle universe.