ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS
https://dask-ms.readthedocs.io
Other
19 stars 7 forks source link

Improve task key names #97

Closed sjperkins closed 4 years ago

sjperkins commented 4 years ago

Description

Currently, task key names are structured as follows.

1527016443_sdp_l0.full_1284.full_pol.ms-[0,0]-UVW-737853cfb364adb48768aac0e708ceb4
1527016443_sdp_l0.full_1284.full_pol.ms-[1,0]-UVW-54821e2f596bcc357d3661627c064d86
1527016443_sdp_l0.full_1284.full_pol.ms-write-MODEL_DATA-7d63acdd1ac63873f3912ec0a821ea9d
1527016443_sdp_l0.full_1284.full_pol.ms-write-MODEL_DATA-ecb06f9adfc1cc5686ee5e1766e62236

Because dask splits the key parts on -, these are often displayed in graphviz as:

1527016443_sdp_l0.full_1284.full_pol.ms
1527016443_sdp_l0.full_1284.full_pol.ms
1527016443_sdp_l0.full_1284.full_pol.ms
1527016443_sdp_l0.full_1284.full_pol.ms

which are difficult to distinguish from each other. We should improve this situation for diagnostic purposes.

sjperkins commented 4 years ago

/cc @JSKenyon

Any thoughts on how key names should be constructed? Including the MS in the key name tends to make the key somewhat long, but I like having it.

JSKenyon commented 4 years ago

I think that the MS name is useful, but as you say it muddies the waters when looking at diagnostic plots. My preference would be to prioritize the structuring of the keys such that more specific/useful information leads less useful global information (such as the MS name). For instance, in your above example, the column names should really take priority.

sjperkins commented 4 years ago

Closed by #102