python / cpython

The Python programming language
https://www.python.org
Other
62.9k stars 30.13k forks source link

cPickle.loads is not thread safe due to non-thread-safe imports #56889

Closed 2af238be-380c-498d-8e57-038365ce803a closed 5 years ago

2af238be-380c-498d-8e57-038365ce803a commented 13 years ago
BPO 12680
Nosy @brettcannon, @ncoghlan, @pitrou, @taleinat, @ericsnowcurrently, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['extension-modules', 'type-bug'] title = 'cPickle.loads is not thread safe due to non-thread-safe imports' updated_at = user = 'https://bugs.python.org/SagivMalihi' ``` bugs.python.org fields: ```python activity = actor = 'taleinat' assignee = 'none' closed = True closed_date = closer = 'taleinat' components = ['Extension Modules'] creation = creator = 'Sagiv.Malihi' dependencies = [] files = [] hgrepos = [] issue_num = 12680 keywords = [] message_count = 6.0 messages = ['141548', '141549', '141652', '141811', '170375', '349036'] nosy_count = 8.0 nosy_names = ['brett.cannon', 'ncoghlan', 'pitrou', 'taleinat', 'Sagiv.Malihi', 'eric.snow', 'serhiy.storchaka', 'DragonFireCK'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue12680' versions = ['Python 2.7', 'Python 3.2'] ```

2af238be-380c-498d-8e57-038365ce803a commented 13 years ago

When trying to cPickle.loads() from several threads at once, there is a race condition when threads try to import modules.

An example will explain it best: suppose I have module foo.py which takes some time to load:

import time
class A(object):
    def __setstate__(self, state):
        self.x = x
time.sleep(1)
x = 5

and a pickled version of an A() object stored in 'A.pkl'. the following code, when run for the first time, will raise a NameError about 'x':

>> p = open('A.pkl','rb').read() >> [thread.start_new(cPickle.loads, (p,)) for x in xrange(2)]

Unhandled exception in thread started by <built-in function loads>
Traceback (most recent call last):
  File "foo.py", line 7, in __setstate__
    self.x = x
NameError: global name 'x' is not defined

since the module is now loaded, subsequent calls to cPickle.loads will work as expected.

This was tested on 2.5.2, 2.7.1, and 3.2 on Ubuntu and on Windows 7.

please note that this bug was discovered when unpickling the standard 'decimal.Decimal' class (decimal.py is quite long and takes some time to import), and this is not some corner case.

2af238be-380c-498d-8e57-038365ce803a commented 13 years ago

OK, digging deeper reveals that there are actually two bugs here, one is conceptual in the python importing mechanism, and the other is technical in cPickle.

The first bug: PyImport_ExecCodeModuleEx adds the module to sys.modules *before* actually executing the code. This is a design flaw (can it really be changed? ) Demonstrating this bug is easy using the foo.py module from the previous comment:

def f():
    if 'bla' in sys.modules:
        bla = sys.modules['bla']
    else:
        import bla
    return bla.A()
running two instances of f in two threads results in the same error.

The second bug: in cPickle.c: func_class() cPickle 'manually' checks if a module is in sys.modules instead of letting the import mechanism do it for him (hence breaking the import lock's defense here).

pitrou commented 13 years ago

PyImport_ExecCodeModuleEx adds the module to sys.modules *before* actually executing the code. This is a design flaw (can it really be changed? )

I guess it is done so to allow for circular imports.

The second bug: in cPickle.c: func_class() cPickle 'manually' checks if a module is in sys.modules instead of letting the import mechanism do it for him (hence breaking the import lock's defense here).

I would guess it is an optimization shortcut. A solution (while keeping the optimization) would be to take the import lock before checking sys.modules.

Note that the _pickle module in 3.x has the same kind of logic, and therefore probably the same issue too (haven't tested).

2af238be-380c-498d-8e57-038365ce803a commented 13 years ago

As I said - it certainly happenes on 3.2 (tested).

@pitrou - what you suggested will not work since the actual import will block on the import lock. The optimization there is not needed, it's already implemented in __import__.

brettcannon commented 12 years ago

I just checked and this is no longer an issue in Python 3.3.

The sys.modules "bug" isn't a bug as that's how it is supposed to work to prevent partially initialized modules. As for how pickle is doing stuff, that could change if it wouldn't break backwards-compatibility.

taleinat commented 5 years ago

I am able to reproduce this with Python 3.6.3 and 3.7.0, but not with 3.7.4, 3.8.0b3 or current master (to be 3.9).

This has been fixed since 3.7.3; see issue bpo-34572.