Open FlorinAndrei opened 3 years ago
From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this?
I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though.
Also seems to very slow compared to joblib's parallel
I'm trying to accelerate Pandas
df.apply()
, and also get a progress bar. The problem is,p_map
is orders of magnitude slower than plainmultiprocess.Pool.map()
for a job where most of the processing is done bynltk.sentiment.vader.SentimentIntensityAnalyzer()
.This notebook is self-explanatory:
https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb
p_map()
is orders of magnitude slower.However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.
Windows 10, Python 3.8.8, Jupyter Notebook