netzkolchose / django-computedfields

Provides autogenerated autoupdated database fields for model methods.
MIT License
94 stars 14 forks source link

Premature thread local attribute access: possibly breaking isolation #154

Open verschmelzen opened 3 days ago

verschmelzen commented 3 days ago

There might be a serious bug here. Afaiu local() is that resolves to the correct context when it's attribute is accessed from inside the running thread. Here you access the members possibly before other threads fork, thus sharing all the data between them.

https://github.com/netzkolchose/django-computedfields/blob/master/computedfields/handlers.py#L31

verschmelzen commented 3 days ago

Here is showcase

from threading import local, Thread

# isolated
print('Isolated, so we will see errors')
l = local()
l.a = 1 # is not available to threads

def print_l():
    try:
        print('try access to non-existent attribute', l.a)
    except:
        print('isolation works')
        import traceback
        traceback.print_exc()
    l.a = 2 # is not available to "next" thread

t = Thread(target=print_l)
t.start()
t.join()

t2 = Thread(target=print_l)
t2.start()
t2.join()

print('\n')
print('\n')

# shared
print('Value shared between threads')
l = local()
class B: pass
A = l.a = B()
A.a = 1

def print_A():
    print('state is shared', A.a)
    A.a = 2 # new value is available to "next" thread

t = Thread(target=print_A)
t.start()
t.join()

t2 = Thread(target=print_A)
t2.start()
t2.join()
jerch commented 2 days ago

@verschmelzen Thx for finding this one - yes you are right, it currently does not decouple data across threads correctly. The issue is line 31-34, were the dict objects get bound back into to module level sharing all stuff across all threads.

A quickfix would be to reshape the data parts and put them behind a guarded access, something like this:

STORAGE = local()

def get_DATA():
    try:
        return STORAGE.DATA
    except AttributeError:
        STORAGE.DATA = {'DELETES': {}, 'M2M_REMOVE': {}, 'M2M_CLEAR': {}, 'UPDATE_OLD': {}}
    return STORAGE.DATA

# access in data handlers:
    # instead of
    DELETES...
    # it is now:
    get_DATA()['DELETES']

(No clue, if the dict abstraction is good enough or if real attributes on STORAGE would be better here ...)

Do you want to write a fix for it?

jerch commented 2 days ago

Just benchmarked it - fully decoupling is actually the fastest here:

STORAGE = local()

def get_DELETES():
    try:
        return STORAGE.DELETES
    except AttributeError:
        STORAGE.DELETES = {}
    return STORAGE.DELETES

def get_M2M_REMOVE():
    try:
        return STORAGE.M2M_REMOVE
    except AttributeError:
        STORAGE.M2M_REMOVE = {}
    return STORAGE.M2M_REMOVE
...

Which seems plausible, as it only cares for the datapoint actually needed.

Edit: Fixed SHARED vs. STORAGE naming. :sweat_smile: