Open atmalagon opened 7 years ago
CC @mrocklin @jcrist
This is a good use case for dask collection duck typing: https://github.com/dask/dask/pull/1068
I'm not sure if I follow how this is a duck typing use case. I'd write this as a method, following your suggestion on SO:
Toward this end, it would be nice if xarray had something like dask.array's to_delayed method for converting a Dataset into an array of delayed datasets, which you could then lazily convert into DataFrame objects and do your computation.
Can you explain why you think this could benefit from collection duck typing?
Can you explain why you think this could benefit from collection duck typing?
Then we could use xarray's normal indexing operations to create a new sub-datasets, wrap them with dask.delayed
and start chaining on delayed method calls like to_dataframe
. The duck typing is necessary so that dask.delayed
knows how to pull the dask graph out from the input Dataset
.
The other component that would help for this is some utility function inside xarray to split a Dataset
(or DataArray
) into sub-datasets for each chunk. Something like:
def split_by_chunks(dataset):
chunk_slices = {}
for dim, chunks in dataset.chunks.items():
slices = []
start = 0
for chunk in chunks:
stop = start + chunk
slices.append(slice(start, stop))
start = stop
chunk_slices[dim] = slices
for slices in itertools.product(*chunk_slices.values()):
selection = dict(zip(chunk_slices.keys(), slices))
yield (selection, dataset[selection])
I think this was closed by mistake. Is there a way to split up Dataset chunks into dask delayed objects where each object is a Dataset?
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here or remove the stale
label; otherwise it will be marked as closed automatically
It would be great to have a function like dask's to_delayed in order to take xarray datasets and convert them to pandas dataframes chunkily.
http://stackoverflow.com/questions/40475884/how-to-convert-an-xarray-dataset-to-pandas-dataframes-inside-a-dask-dataframe