swansonk14 / p_tqdm

Parallel processing with progress bars
MIT License
457 stars 44 forks source link

Numpy array iteration problem #13

Closed izkgao closed 4 years ago

izkgao commented 4 years ago

I tried to use p_map to do iteration on a 3d Numpy array, but the answer was not identical to those of for loop and map. Below is a simple example.

a = np.arange(12).reshape([2,2,3])
result = []
for i in a:
    result.append(np.sum(i))

result is [15, 51].

a = np.arange(12).reshape([2,2,3])
result = list(map(lambda x: np.sum(x), a)

result is [15, 51].

a = np.arange(12).reshape([2,2,3])
result = p_map(lambda x: np.sum(x), a)

result is [66].

swansonk14 commented 4 years ago

Hi @izkgao, thank you for pointing that out! I previously had code in p_tqdm that required the iterables to be lists and treated other objects (like numpy arrays) as single element lists. I've now fixed this so that all iterables, including numpy arrays, are processed correctly: https://github.com/swansonk14/p_tqdm/releases/tag/v_1.3.3. Please install p_tqdm version 1.3.3 to get the fix.

Two notes for your interest: 1) To save some typing, you can do p_map(np.sum, a) instead of p_map(lambda x: np.sum(x), a), though both are valid. 2) I believe numpy automatically does efficient vectorization of many of its operations, so it's possible that using the appropriate numpy operations (possibly using np.vectorize) may be faster than p_tqdm, though I'm not sure. It could be worth comparing the two. (And I would be interested to hear your results!)