Open JSKenyon opened 4 years ago
Temporary work-around is to use xarray.Dataset.reindex
Just a brief follow-up here. xarray.Dataset.reindex
does work (and is awesome) but there is a slight problem. reindex
uses a fill value for the added elements. This can lead to on-disk data being overwritten (e.g. writing two correlations, reindexed to four, back to a column will overwrite values with the fill-value). There are other approaches:
Just checking in here - this is becoming increasingly necessary. I might have to start considering option two above which is really inelegant. @sjperkins How much work do you think would be involved in making the writes aware of the correlation axis and using putcolslice
appropriately?
Following on from our online conversation
row
dimension only.corr
, for instance. chan
might also be useful.putcolslice(column, data[rr:rr + rl], blc, trc, startrow=rs, nrow=rl)
command.corr
).A way forward may be possible by passing in blc
, trc
ranges into the blockwise call, rather than individual values, as is presently done here https://github.com/ska-sa/dask-ms/blob/55c987e5f00c24a82c16363f681a966e45590e06/daskms/writes.py#L629-L635
This is feature request, not a bug.
xarray offers the following awesome functionality:
which in this example instantly makes a new xds using only some of the correlations. This is great because it is applied to the entire xds, which means all the fields remain consistent. The only drawback comes when attempting to write MS columns, as a field with two correlations on the xds cannot be written to a column containing four correlations. It would be really cool if dask-ms could support this.
My instinct is that this might become simple if the corr dimension is given a coordinate. That way there is a paper trail showing the correlations which have been selected out, and consequently a way to determine how to store them.