We're having small chunks of a dataset while processing zarr in weather-mv. These changes will convert those chunks into dataframe and then extract rows directly from dataframes. As the chunk size in our control we can control the memory consumption during the pipeline.
Considerable Points
Works with the zarr dataset for now.
Will update for all types of dataset in future.
Users can pass cli arguments to open datasets in the specified chunk scheme.
Example
We're having small chunks of a dataset while processing zarr in
weather-mv
. These changes will convert those chunks intodataframe
and then extract rows directly fromdataframes
. As the chunk size in our control we can control the memory consumption during the pipeline.Considerable Points
zarr
dataset for now.Partially Solved: #414