ssec-jhu / dplutils

Distributed(Data) Pipeline Uitilities
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Futureproof streaming dataframe split #87

Closed amitschang closed 2 months ago

amitschang commented 3 months ago

Following warnings come from using np.array_split on dataframe:

tests/pipeline/test_stream_executor.py: 15 warnings                                                                                                                          
  /Users/arik/ws/ssec/dplutils/.tox/test/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in 
a future version. Please use 'DataFrame.transpose' instead.                                                                                                                  
    return bound(*args, **kwds)                                     

potential solution:

chunks = np.linspace(0, number_of_splits, num=rows_in_df, endpoint=False, dtype=np.int32)
splits = [chunk for _, chunk in df.groupby(chunks)