pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

Task naming for general chunkmanagers #7813

Open TomNicholas opened 1 year ago

TomNicholas commented 1 year ago

What is your issue?

(Follow-up to #7019)

When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on, or whether they represent an open_dataset call.

Currently for cubed this doesn't work, for example this graph from https://github.com/pangeo-data/distributed-array-examples/issues/2#issuecomment-1533852877:

image

cc @tomwhite @dcherian

tomwhite commented 1 year ago

If you hover over a node in the SVG representation you'll get a tooltip that shows the call stack and the line number of the top-level user function that invoked the computation. Does that help at all? (That said, I'm open to changing the way it is displayed, or how tasks are named in general.)

BTW should this be moved to a cubed issue?

TomNicholas commented 1 year ago

If you hover over a node in the SVG representation you'll get a tooltip that shows the call stack and the line number of the top-level user function that invoked the computation. Does that help at all?

That's neat!

When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on

I realise now that this is not true - but can we make it true for cubed in xarray? Using cubed with xarray creates array's with names like array-002, but couldn't we use the dataarray's .name attribute to give this node of the graph the name "U" for example?

BTW should this be moved to a cubed issue?

I raised it here because it relates to an unfinished part of #7019 - where there is still dask-specific logic for naming individual tasks. I think that to solve this we will need to alter xarray code to allow ChunkManager objects to decide how they want to name their tasks, but using information passed from xarray (i.e. DataArray.name).

tomwhite commented 1 year ago

Ah I understand better now. This makes sense - if ChunkManager has a name then the implementation could use that to name tasks.