map_sync with pandas operation function does not finish.

ipython / ipyparallel

IPython Parallel: Interactive Parallel Computing in Python

Other

2.58k stars 1k forks source link

Map_sync with pandas operation function does not finish.

I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by using map_sync. The pandas operation is just about groupby and apply.

My code is like this: PEN = 40 dfs = np.array_split(target_df, PEN) c = ipp.Cluster(n=PEN) with c as rc: e_all = rc[:] results = e_all.map_sync(FUCTION, dfs)
results

I have 30 target_dfs. For the first 10 target dfs map_sync worked fine. But after that map_sync didn't complete. I have found that without parallelism, the pandas job applied to target_df completes in under 2 hours. I use window os and Ipyparallel version is the lastest.

ipython / ipyparallel

map_sync with pandas operation function does not finish. #844