Closed tmolteno closed 5 months ago
Thanks for the report @tmolteno. I tried this in a clean vrirtual environment
simon@simon-t14:~/tmp$ virtualenv -p python3.10 venv
created virtual environment CPython3.10.12.final.0-64 in 55ms
creator CPython3Posix(dest=/home/simon/tmp/venv, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/simon/.local/share/virtualenv)
added seed packages: pip==24.0, setuptools==69.2.0, wheel==0.43.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
and the following seemed to result in an install of fsspec==2024.3.1
.
simon@simon-t14:~/tmp$ source venv/bin/activate
(venv) simon@simon-t14:~/tmp$ pip install dask-ms[zarr] --upgrade
Collecting dask-ms[zarr]
Using cached dask_ms-0.2.20-py3-none-any.whl.metadata (6.4 kB)
Collecting appdirs<2.0.0,>=1.4.4 (from dask-ms[zarr])
Using cached appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting dask>=2023.1.1 (from dask[array]>=2023.1.1->dask-ms[zarr])
Using cached dask-2024.5.0-py3-none-any.whl.metadata (3.8 kB)
Collecting donfig<0.8.0,>=0.7.0 (from dask-ms[zarr])
Using cached donfig-0.7.0-py2.py3-none-any.whl
Collecting python-casacore<4.0.0,>=3.5.1 (from dask-ms[zarr])
Using cached python_casacore-3.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting zarr<3.0.0,>=2.12.0 (from dask-ms[zarr])
Using cached zarr-2.17.2-py3-none-any.whl.metadata (5.7 kB)
Collecting click>=8.1 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting cloudpickle>=1.5.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached cloudpickle-3.0.0-py3-none-any.whl.metadata (7.0 kB)
Collecting fsspec>=2021.09.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached fsspec-2024.3.1-py3-none-any.whl.metadata (6.8 kB)
Collecting packaging>=20.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting partd>=1.2.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached partd-1.4.1-py3-none-any.whl.metadata (4.6 kB)
Collecting pyyaml>=5.3.1 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting toolz>=0.10.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached toolz-0.12.1-py3-none-any.whl.metadata (5.1 kB)
Collecting importlib-metadata>=4.13.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached importlib_metadata-7.1.0-py3-none-any.whl.metadata (4.7 kB)
Collecting numpy>=1.21 (from dask[array]>=2023.1.1->dask-ms[zarr])
Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting six (from python-casacore<4.0.0,>=3.5.1->dask-ms[zarr])
Using cached six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting asciitree (from zarr<3.0.0,>=2.12.0->dask-ms[zarr])
Using cached asciitree-0.3.3-py3-none-any.whl
Collecting numcodecs>=0.10.0 (from zarr<3.0.0,>=2.12.0->dask-ms[zarr])
Using cached numcodecs-0.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.8 kB)
Collecting fasteners (from zarr<3.0.0,>=2.12.0->dask-ms[zarr])
Using cached fasteners-0.19-py3-none-any.whl.metadata (4.9 kB)
Collecting zipp>=0.5 (from importlib-metadata>=4.13.0->dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached zipp-3.18.1-py3-none-any.whl.metadata (3.5 kB)
Collecting locket (from partd>=1.2.0->dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
Using cached locket-1.0.0-py2.py3-none-any.whl.metadata (2.8 kB)
Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Using cached dask-2024.5.0-py3-none-any.whl (1.2 MB)
Using cached python_casacore-3.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35.3 MB)
Using cached zarr-2.17.2-py3-none-any.whl (208 kB)
Using cached dask_ms-0.2.20-py3-none-any.whl (138 kB)
Using cached click-8.1.7-py3-none-any.whl (97 kB)
Using cached cloudpickle-3.0.0-py3-none-any.whl (20 kB)
Using cached fsspec-2024.3.1-py3-none-any.whl (171 kB)
Using cached importlib_metadata-7.1.0-py3-none-any.whl (24 kB)
Using cached numcodecs-0.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB)
Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Using cached packaging-24.0-py3-none-any.whl (53 kB)
Using cached partd-1.4.1-py3-none-any.whl (18 kB)
Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Using cached toolz-0.12.1-py3-none-any.whl (56 kB)
Using cached fasteners-0.19-py3-none-any.whl (18 kB)
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Using cached zipp-3.18.1-py3-none-any.whl (8.2 kB)
Using cached locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Installing collected packages: asciitree, appdirs, zipp, toolz, six, pyyaml, packaging, numpy, locket, fsspec, fasteners, cloudpickle, click, python-casacore, partd, numcodecs, importlib-metadata, donfig, zarr, dask, dask-ms
Successfully installed appdirs-1.4.4 asciitree-0.3.3 click-8.1.7 cloudpickle-3.0.0 dask-2024.5.0 dask-ms-0.2.20 donfig-0.7.0 fasteners-0.19 fsspec-2024.3.1 importlib-metadata-7.1.0 locket-1.0.0 numcodecs-0.12.1 numpy-1.26.4 packaging-24.0 partd-1.4.1 python-casacore-3.5.2 pyyaml-6.0.1 six-1.16.0 toolz-0.12.1 zarr-2.17.2 zipp-3.18.1
Also, how was the zarr dataset created and how are you trying to open it? At present, there's an undocumented and subject to change converter that converts v2 Measurement Sets to a Measurement Set-like zarr format:
(venv) simon@simon-t14:~/tmp$ dask-ms convert ~/data/C147_unflagged.MS/ -f zarr -o C147.zarr
...
Which should then be accessible using (also undocumented and subject to change) xds_from_storage_ms
:
(venv) simon@simon-t14:~/tmp$ python -c "from daskms import xds_from_storage_ms; print(xds_from_storage_ms('C147.zarr'))"
[<daskms.dataset.Dataset object at 0x7ad582ce4fd0>]
It should also be noted that some packages in your pip
output appear to be outside the virtual environment @tmolteno. That can indicate something unwholesome with paths, or using --system-site-packages
, which is best avoided where possible.
Yep, I'm using --system-site-packages. I do this by default as non x86-64 systems like 64-bit arm often do not build from wheels, or build at all (astropy here's looking at you). The zarr is from here (BUCKET=s3://ratt-public-data DATA=ESO137/ms1_primary_subset.zarr)
I did find the converter yesterday (as it's mentioned in a few bug reports), but haven't played with that yet :)
Anyway, useful for requirements to specify min versions where possible
Anyway, useful for requirements to specify min versions where possible
Its true that dask-ms doesn't specify fsspec as a direct dependency: t's a transitive dependency included via dask, currently specified as fsspec >= 2021.09.0
in master.
https://github.com/dask/dask/blob/bc6f42b867cdf9d415485a844fc7fa53c64f32c2/pyproject.toml#L35
Yep, I'm using --system-site-packages. I do this by default as non x86-64 systems like 64-bit arm often do not build from wheels, or build at all (astropy here's looking at you).
Ah, that makes things a bit more understandable -- I do tend to take the "install it in a virtual environment" line so I don't have much experience with hybrid installs.
Also, as @JSKenyon correctly spotted, it's finding your system install of fsspec and, since it falls within the bounds specified by dask, that version is considered acceptable.
So this report is to suggest that the fsspec requirement should be specified. I think something greater than 2022.7 release would do (based on a rudimentary git blame on fsspec).
Thanks for investigating here. As dask-ms does have it's own DaskMSStore
class that uses fsspec
to reason about paths, I'm leaning towards versioning fsspec
in dask-ms
.
Note fsspec was 2022.1.0.
Description
Opening a zarr causes
This is present in more recent versions of fsspec.
Fix
So this report is to suggest that the fsspec requirement should be specified. I think something greater than 2022.7 release would do (based on a rudimentary git blame on fsspec).