uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.38k stars 89 forks source link

Problem pickling lambda function #196

Closed btlvr closed 3 years ago

btlvr commented 4 years ago

I have this code:

from pathos.multiprocessing import ProcessingPool

items = [1,2,3]

def expand(func):
    expected_result = map(func, items)
    pathos_result = ProcessingPool().map(func, items)

    return list(pathos_result), list(expected_result)

for power in [3,2,1]:
    print(expand(lambda x : x**power))

Which outputs:

([1, 8, 27], [1, 8, 27])
([1, 8, 27], [1, 4, 9])
([1, 8, 27], [1, 2, 3])

Why doesn't it produce the following output?

([1, 8, 27], [1, 8, 27])
([1, 4, 9], [1, 4, 9])
([1, 2, 3], [1, 2, 3])

I've tried pathos 0.2.3 and 0.2.6

emorice commented 4 years ago

I think what you want is:

from pathos.multiprocessing import ProcessingPool

items = [1,2,3]

def expand(func):
    expected_result = map(func, items)
    pathos_result = ProcessingPool().map(func, items)

    return list(pathos_result), list(expected_result)

for power in [3,2,1]:
    print(expand(lambda x, power=power: x**power))

Which works as expected for me:

([1, 8, 27], [1, 8, 27])
([1, 4, 9], [1, 4, 9])
([1, 2, 3], [1, 2, 3])

This is a classical pattern when dealing with closures in python, compare:

>>> funcs = [(lambda : x) for x in [1, 2, 3]]
>>> [f() for f in funcs]
[3, 3, 3]

and:

>>> funcs = [(lambda value=x: value) for x in [1, 2, 3]]
>>> [f() for f in funcs]
[1, 2, 3]

And see the python FAQ for the detailed explanation.

In short, your lambdas all refers to the same shared variable from the enclosing scope instead of having their own copy, which behaves as expected when you evaluate them right away in the same environment but break downs when you need to do more exotic things with them such as sending them to other workers.

More precisely, in this specific case, I guess that when using closures, the value of power is sent to the workers when they are created (and thus set to 3) on the first call to expand, then the same, identical lambdas are sent on the next iterations and all use the initial value of 3 they find in the workers, that hasn't been updated since the loop only modified the original power and not the copies in the workers.

mmckerns commented 4 years ago

Thanks @emorice for the nice response. @btlvr if this doesn't answer your question, please reopen the ticket.