nalepae / pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs
https://nalepae.github.io/pandarallel
BSD 3-Clause "New" or "Revised" License
3.59k stars 208 forks source link

UnpicklingError: pickle data was truncated #238

Closed ywsyws closed 1 year ago

ywsyws commented 1 year ago

General

Acknowledgement

Bug description

Tried to calculate DateTimeRange from 2 columns of datetime64[ns, UTC]. It gave UnpicklingError: pickle data was truncated towards the end.

/opt/conda/lib/python3.8/site-packages/pandarallel/core.py in closure(data, user_defined_function, *user_defined_function_args, **user_defined_function_kwargs)
    323 
    324             try:
--> 325                 return wrapped_reduce_function(
    326                     (Path(output_file.name) for output_file in output_files),
    327                     reduce_extra,

/opt/conda/lib/python3.8/site-packages/pandarallel/core.py in closure(output_file_paths, extra)
    197         )
    198 
--> 199         return reduce_function(dfs, extra)
    200 
    201     return closure

/opt/conda/lib/python3.8/site-packages/pandarallel/data_types/dataframe.py in reduce(datas, extra)
     47         ) -> pd.DataFrame:
     48             if isinstance(datas, GeneratorType):
---> 49                 datas = list(datas)
     50             axis = 0 if isinstance(datas[0], pd.Series) else 1 - extra["axis"]
     51             return pd.concat(datas, copy=False, axis=axis)

/opt/conda/lib/python3.8/site-packages/pandarallel/core.py in <genexpr>(.0)
    193 
    194         dfs = (
--> 195             get_dataframe_and_delete_file(output_file_path)
    196             for output_file_path in output_file_paths
    197         )

/opt/conda/lib/python3.8/site-packages/pandarallel/core.py in get_dataframe_and_delete_file(file_path)
    187         def get_dataframe_and_delete_file(file_path: Path) -> Any:
    188             with file_path.open("rb") as file_descriptor:
--> 189                 data = pickle.load(file_descriptor)
    190 
    191             file_path.unlink()

UnpicklingError: pickle data was truncated

Observed behavior

Same as Bug description

Expected behavior

Finish w/o error

Minimal but working code sample to ease bug fix for pandarallel team

I simply switch back to progress_apply

matteoettam09 commented 11 months ago

Hello, currently getting the same issue, any idea what the root cause may be? Thank you.

ywsyws commented 11 months ago

Hello, it was due to the lack of resources of my machine. It couldn't release the resources with other solutions proposed in another thread. I finally reboot my pod and the problem went away.

matteoettam09 commented 11 months ago

Thanks!