Issue 176 - Githubissues

ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS

https://dask-ms.readthedocs.io

Other

19 stars 7 forks source link

Issue 176 #194

Closed JSKenyon closed 2 years ago

JSKenyon commented 2 years ago

[ ] Tests added / passed
```
$ py.test -v -s daskms/tests
```
If the pep8 tests fail, the quickest way to correct this is to run autopep8 and then flake8 and pycodestyle to fix the remaining issues.
```
$ pip install -U autopep8 flake8 pycodestyle
$ autopep8 -r -i daskms
$ flake8 daskms
$ pycodestyle daskms
```
[ ] Fully documented, including HISTORY.rst for all changes and one of the docs/*-api.rst files for new API

To build the docs locally:
```
pip install -r requirements.readthedocs.txt
cd docs
READTHEDOCS=True make html
```

JSKenyon commented 2 years ago

This PR adds support for index_columns during conversion from measurement set to parquet/zarr. This means that output data can be written with a different ordering than input data. This is not necessarily efficient as it will use TAQL and fragmented reads to accomplish the reordering.

In addition to exposing this feature, this PR fixes incorrect conversion of boolean values when using parquet. This is because pyarrow represents boolean values as bits whereas python represents them as bytes. The fix in this PR works but is a prelude to a slight rewrite of the arrow extensions. Currently, extensions use the from_buffer to convert from arrow to numpy. This relies on memory layout (hence the aforementioned bug) and may be unnecessary. A future PR will instead use array.storage.flatten in conjunction with to_numpy and reshape to move between representations. This should be less vulnerable to discrepancies in memory layout.

JSKenyon commented 2 years ago

I changed my mind - I imagine that the use of buffers if future-proofing against ragged data. I figured out how to use buffers with booleans and have implemented a fix. Note that this is still not a zero-copy operation in the case of booleans as we have to move from bit to byte representation.