swansonk14 / p_tqdm

Parallel processing with progress bars
MIT License
457 stars 44 forks source link

p_map() very slow compared to multiprocess.Pool.map() #40

Open FlorinAndrei opened 3 years ago

FlorinAndrei commented 3 years ago

I'm trying to accelerate Pandas df.apply(), and also get a progress bar. The problem is, p_map is orders of magnitude slower than plain multiprocess.Pool.map() for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer().

This notebook is self-explanatory:

https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb

p_map() is orders of magnitude slower.

However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.

Windows 10, Python 3.8.8, Jupyter Notebook

nuttyartist commented 1 year ago

From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this?

AeroTH310 commented 1 year ago

I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though.

BenjaminHoegh commented 1 year ago

Also seems to very slow compared to joblib's parallel