nalepae / pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs
https://nalepae.github.io/pandarallel
BSD 3-Clause "New" or "Revised" License
3.59k stars 208 forks source link

Fix ValueError for Empty DataFrames: Ensure Process Count is at Least 1 #245

Closed Mithil467 closed 4 months ago

Mithil467 commented 1 year ago

Since nb_item comes out as 0 for empty dataframes and series, we were returning an empty list from the chunk function. Hence, we were yielding nothing from our DataType.get_chunks method which caused our chunks list being empty and nb_workers = len(chunks) = 0.

Let me know if this fix seems good enough, and also if we need to add any new tests.

Fixes #115, fixes #141.

codecov[bot] commented 1 year ago

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (4666931) 86.72% compared to head (7c98541) 91.28%.

Files Patch % Lines
pandarallel/progress_bars.py 50.00% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #245 +/- ## ========================================== + Coverage 86.72% 91.28% +4.55% ========================================== Files 12 12 Lines 580 585 +5 ========================================== + Hits 503 534 +31 + Misses 77 51 -26 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

till-m commented 1 year ago

Hi @Mithil467,

thanks for the PR. Could you please add a test for this case?

Mithil467 commented 1 year ago

Sure. I noticed that when progress_bar is True, we get ZeroDivisionError. In order to fix that, I would need to know what we expect UI wise. I personally like [2] more than [1] as it gives a sense of success, but would like to know your opinions or if you want to do things differently. Hence, I have added [2] for now, let me know if it needs changes.

Consoles:

  1. image
  2. image

Notebook:

  1. image
  2. image
till-m commented 1 year ago

Personally, I prefer option 1.: The grey bar is neither failure (red) nor success (green), similarly, the processing of the empty DataFrame didn't really succeed or fail. @nalepae do you have an opinion on the matter?

Mithil467 commented 7 months ago

@till-m @nalepae If you say so, I can make the necessary changes to implement option [1]. Should I go ahead?

SiRumCz commented 5 months ago

Hi dev team, any updates on this one?

till-m commented 5 months ago

My apologies, I am not maintaining this project anymore, hence me not responding. But I can make an exception and see this PR through.

I would suggest going with [1]. @Mithil467 could you kindly ping me when/if you've implemented that?

Mithil467 commented 5 months ago

@till-m @nalepae I have implemented the progress bar change to type [1]. Please review.

nalepae commented 5 months ago

Pandaral·lel is looking for a maintainer! If you are interested, please open an GitHub issue.

till-m commented 4 months ago

Thanks for the contribution :)

Mithil467 commented 4 months ago

Thanks for the help! @till-m