pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

implement `dask` methods on `DataTree` #9670

Closed keewis closed 4 weeks ago

keewis commented 4 weeks ago

For now this only contains compute and load, but in implementing those I've found a bug in copy: we don't actually (shallow) copy the variables, which means that my implementation of compute would modify the original tree.

keewis commented 4 weeks ago

we're still missing an implementation for persist, which I skipped mostly because I wasn't sure how to test that. @TomNicholas, would you like to help me with that? If so, feel free to directly push to this PR.

TomNicholas commented 4 weeks ago

we're still missing an implementation for persist, which I skipped mostly because I wasn't sure how to test that. @TomNicholas, would you like to help me with that? If so, feel free to directly push to this PR.

I'm happy to help, but I think that (a) persist is much more niche than load and compute, so isn't as important to be in there before release, and (b) it's a separate method rather than a bugfix to this PR so it would make more sense for me to add that in a follow-up PR and we just merge this one without persist.

keewis commented 4 weeks ago

that would be fine with me.

That leaves us with the question whether we should guard against chunk with non-existing dims, and after resolving that we should be ready for a final review.

TomNicholas commented 4 weeks ago

whether we should guard against chunk with non-existing dims

We should, see https://github.com/pydata/xarray/pull/9670#discussion_r1815486043

keewis commented 4 weeks ago

We should, see #9670 (comment)

yeah, I saw that one after posting. Should be done now, though.

keewis commented 4 weeks ago

actually, I just tried to compute a DataTree object without chunked arrays (while trying to investigated your comment on chunksizes), and I got an ugly error. I'm investigating, but should not take too long to fix.

keewis commented 4 weeks ago

Tell me what you think about the reworded docstrings of chunksizes / Dataset.chunks, but otherwise this should be good to go, too.

TomNicholas commented 4 weeks ago

Looks good! Let's merge.

keewis commented 4 weeks ago

(I just noticed that DataArray.chunksizes has the same issue, so I'll change that, too)

TomNicholas commented 4 weeks ago

NamedArray.chunksizes too hahaha

keewis commented 4 weeks ago

should be good now, hopefully