nalepae / pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs
https://nalepae.github.io/pandarallel
BSD 3-Clause "New" or "Revised" License
3.65k stars 210 forks source link

Some workers stuck while others finish 100% #244

Closed SysuJayce closed 1 year ago

SysuJayce commented 1 year ago

General

Bug description

image

I started a parallel_apply program with 80 workers to decode and clean a large amount of data(about 50GB), after nearly 8mins, most of them reached 100%, but some got stuck. And after 20mins, the progress_bar is still freeze.

pandarallel.initialize(nb_workers=os.cpu_count(), progress_bar=True)
df["text"] = df["text"].parallel_apply(decode_clean)

Observed behavior

Progress_bar freezes and cpu usage is 0%

Expected behavior

The process progress should be nearly linear, the program should be finished after arount 10mins according to the progress_bar.

Minimal but working code sample to ease bug fix for pandarallel team

Write here the minimal code sample to ease bug fix for pandarallel team

till-m commented 1 year ago

Hi @SysuJayce, I appreciate the bug report and sadly this is not the first time this specific issue has presented itself, if I remember correctly. Unfortunately it is very hard to solve without code that consistently reproduces the problem, so I am not sure what I can do here.

till-m commented 1 year ago

Presumably this should take less than 20minutes, yes? Could you try running the task with progress bars disabled?

SysuJayce commented 1 year ago

hi @till-m , I found that disable progress_bar is a workaround. Thanks

till-m commented 1 year ago

Hi @SysuJayce, could you check the following?

  1. do the progress bars freeze consistently, i.e. if you rerun as originally, do they freeze again and at the same position?
  2. if you're running in jupyter, can you try running in a terminal and check if they freeze there also (and vice-versa if you're running in terminal currently)?
till-m commented 1 year ago

Closing for now, feel free to update with more information.