Closed bennahugo closed 1 year ago
I think I will try and fix the REST issue in another PR... not sure where it comes from exactly -- maybe deep in daskms land
Hmm... tests are not actually triggering because they are travis remnants... will try put in a action based ci for this in this PR as well
Actually no... test cases needs casacore-data... will configure a Jenkins job for this
Struggling to get the tests to pass in a clean environment. Latest is
# We only need to pass in dimension extent arrays if
# there is more than one chunk in any of the non-row columns.
# In that case, we can putcol, otherwise putcolslice is required
inlinable_arrays = [row_order]
if (row_order.shape[0] != array.shape[0] or
row_order.chunks[0] != array.chunks[0]):
> raise ValueError(f"ROWID shape and/or chunking does "
f"not match that of {column}")
E ValueError: ROWID shape and/or chunking does not match that of ANTENNA1
Will need to wait for next week while I focus on my PhD.
I'm really not sure how I got this passing on my production machine
xova/apps/xova/app.py:107: in execute
main_writes = xds_to_table(output_ds, args.output, "ALL",
../venvxova/lib/python3.8/site-packages/daskms/dask_ms.py:96: in xds_to_table
out_ds = write_datasets(table_name, xds, columns,
../venvxova/lib/python3.8/site-packages/daskms/writes.py:725: in write_datasets
write_datasets = _write_datasets(table, tp, datasets, columns,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
table = '/tmp/tmpud07fd4u_averaged.ms'
table_proxy = TableProxy[/tmp/tmpud07fd4u_averaged.ms](table, /tmp/tmpud07fd4u_averaged.ms, ack=False,readonly=False,lockoptions=user,__executor_key__=/tmp/tmpud07fd4u_averaged.ms)
datasets = []
columns = ['ANTENNA1', 'ANTENNA2', 'ARRAY_ID', 'DATA', 'DATA_DESC_ID', 'EXPOSURE', ...]
descriptor = 'ms(False)', table_keywords = None, column_keywords = None
def _write_datasets(table, table_proxy, datasets, columns, descriptor,
table_keywords, column_keywords):
_, table_name, subtable = table_path_split(table)
table_name = '::'.join((table_name, subtable)) if subtable else table_name
row_orders = []
# Put table and column keywords
table_proxy.submit(_put_keywords, WRITELOCK,
table_keywords, column_keywords).result()
# Sort datasets on (not has "ROWID", index) such that
# datasets with ROWID's are handled first, while
# those without (which imply appends to the MS)
# are handled last
sorted_datasets = sorted(enumerate(datasets),
key=lambda t: ("ROWID" not in t[1].data_vars,
t[0]))
# Establish row orders for each dataset
for di, ds in sorted_datasets:
try:
rowid = ds.ROWID.data
except AttributeError:
# Add operation
# No ROWID's, assume they're missing from the table
# and remaining datasets. Generate addrows
# NOTE(sjperkins)
# This could be somewhat brittle, but exists to
# update MS empty subtables once they've been
# created along with the main MS by a call to default_ms.
# Users could also it to append rows to an existing table.
# An xds_append_to_table may be a better solution...
last_datasets = datasets[di:]
last_row_orders = add_row_order_factory(table_proxy, last_datasets)
# We don't inline the row ordering if it is derived
# from the row sizes of provided arrays.
# The range of possible dependencies are far too large to inline
row_orders.extend([(False, lro) for lro in last_row_orders])
# We have established row orders for all datasets
# at this point, quit the loop
break
else:
# Update operation
# Generate row orderings from existing row IDs
row_order = cached_row_order(rowid)
# Inline the row ordering in the graph
row_orders.append((True, row_order))
assert len(row_orders) == len(datasets)
datasets = []
for (di, ds), (inline, row_order) in zip(sorted_datasets, row_orders):
# Hold the variables representing array writes
write_vars = {}
# Generate a dask array for each column
for column in columns:
try:
variable = ds.data_vars[column]
except KeyError:
log.warning("Ignoring '%s' not present "
"on dataset %d" % (column, di))
continue
else:
full_dims = variable.dims
array = variable.data
if not isinstance(array, da.Array):
raise TypeError("%s on dataset %d is not a dask Array "
"but a %s" % (column, di, type(array)))
args = [row_order, ("row",)]
# We only need to pass in dimension extent arrays if
# there is more than one chunk in any of the non-row columns.
# In that case, we can putcol, otherwise putcolslice is required
inlinable_arrays = [row_order]
if (row_order.shape[0] != array.shape[0] or
row_order.chunks[0] != array.chunks[0]):
> raise ValueError(f"ROWID shape and/or chunking does "
f"not match that of {column}")
E ValueError: ROWID shape and/or chunking does not match that of ANTENNA1
It seems the error stems deep from within daskms land and possibly due to dask's stupidity with handling chunk shaps correctly. The issue is not the same as with previous dask[array] versions where a reduction over axis 1 of UVW fails (uv dist computation).
Bit at wits end with this so will commit what I have right now and get back to this when I get to my desktop -- maybe pip freeze will be telling of which dask versions to pin to.
actually I have it working. There is breakage either @sjperkins or @JSKenyon introduced inside daskms since version 0.2.6 was released (I note there has been a lot of changes to ms output and chunking which could have caused this). I don't really have time to dig into daskms right now, but suffice to work around the upstream issues by pinning to dask-ms==0.2.6.
Working daskms versions: 0.2.6 0.2.7 0.2.8 0.2.9 0.2.10 0.2.11
Specifically breakage started appearing in 0.2.12
retest this please
Alright as discussed with @sjperkins we are going to keep only jenkins testing for now. I will open a separate PR just to test the install.
The long running plan is to put in full qualification testing on this with real data and simulated data (essentially automating what I've done for the memo) onto both axes.
Fixes many failing test cases and makes the UVW recompute fall over gently when the test case ephem polynomial is > 1