ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS
https://dask-ms.readthedocs.io
Other
19 stars 7 forks source link

Update fsspec version requirements to avoid an error opening zarr #327

Closed tmolteno closed 5 months ago

tmolteno commented 6 months ago
$ pip install dask-ms[zarr] --upgrade

Requirement already satisfied: dask-ms[zarr] in /home/tim/.tartvenv/lib/python3.10/site-packages (0.2.20)
Requirement already satisfied: python-casacore<4.0.0,>=3.5.1 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask-ms[zarr]) (3.5.2)
Requirement already satisfied: appdirs<2.0.0,>=1.4.4 in /usr/lib/python3/dist-packages (from dask-ms[zarr]) (1.4.4)
Requirement already satisfied: donfig<0.8.0,>=0.7.0 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask-ms[zarr]) (0.7.0)
Requirement already satisfied: dask[array]>=2023.1.1 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask-ms[zarr]) (2024.4.2)
Requirement already satisfied: zarr<3.0.0,>=2.12.0 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask-ms[zarr]) (2.17.2)
Requirement already satisfied: fsspec>=2021.09.0 in /usr/lib/python3/dist-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (2022.1.0)
Requirement already satisfied: partd>=1.2.0 in /usr/lib/python3/dist-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (1.2.0)
Requirement already satisfied: click>=8.1 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (8.1.7)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/lib/python3/dist-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (5.4.1)
Requirement already satisfied: toolz>=0.10.0 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (0.12.0)
Requirement already satisfied: cloudpickle>=1.5.0 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (3.0.0)
Requirement already satisfied: packaging>=20.0 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (23.2)
Requirement already satisfied: importlib-metadata>=4.13.0 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (7.1.0)
Requirement already satisfied: numpy>=1.21 in /home/tim/.tartvenv/lib/python3.10/site-packages (from dask[array]>=2023.1.1->dask-ms[zarr]) (1.25.2)
Requirement already satisfied: six in /home/tim/.tartvenv/lib/python3.10/site-packages (from python-casacore<4.0.0,>=3.5.1->dask-ms[zarr]) (1.16.0)
Requirement already satisfied: fasteners in /usr/lib/python3/dist-packages (from zarr<3.0.0,>=2.12.0->dask-ms[zarr]) (0.14.1)
Requirement already satisfied: asciitree in /usr/lib/python3/dist-packages (from zarr<3.0.0,>=2.12.0->dask-ms[zarr]) (0.3.3)
Requirement already satisfied: numcodecs>=0.10.0 in /home/tim/.tartvenv/lib/python3.10/site-packages (from zarr<3.0.0,>=2.12.0->dask-ms[zarr]) (0.12.1)
Requirement already satisfied: zipp>=0.5 in /usr/lib/python3/dist-packages (from importlib-metadata>=4.13.0->dask[array]>=2023.1.1->dask-ms[zarr]) (1.0.0)

Note fsspec was 2022.1.0.

Description

Opening a zarr causes

AttributeError: 'LocalFileSystem' object has no attribute 'unstrip_protocol'. Did you mean: '_strip_protocol'?". 

This is present in more recent versions of fsspec.

Fix

pip install fsspec --upgrade

So this report is to suggest that the fsspec requirement should be specified. I think something greater than 2022.7 release would do (based on a rudimentary git blame on fsspec).

sjperkins commented 6 months ago

Thanks for the report @tmolteno. I tried this in a clean vrirtual environment

simon@simon-t14:~/tmp$ virtualenv -p python3.10 venv
created virtual environment CPython3.10.12.final.0-64 in 55ms
  creator CPython3Posix(dest=/home/simon/tmp/venv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/simon/.local/share/virtualenv)
    added seed packages: pip==24.0, setuptools==69.2.0, wheel==0.43.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

and the following seemed to result in an install of fsspec==2024.3.1.

simon@simon-t14:~/tmp$ source venv/bin/activate
(venv) simon@simon-t14:~/tmp$ pip install dask-ms[zarr] --upgrade
Collecting dask-ms[zarr]
  Using cached dask_ms-0.2.20-py3-none-any.whl.metadata (6.4 kB)
Collecting appdirs<2.0.0,>=1.4.4 (from dask-ms[zarr])
  Using cached appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting dask>=2023.1.1 (from dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached dask-2024.5.0-py3-none-any.whl.metadata (3.8 kB)
Collecting donfig<0.8.0,>=0.7.0 (from dask-ms[zarr])
  Using cached donfig-0.7.0-py2.py3-none-any.whl
Collecting python-casacore<4.0.0,>=3.5.1 (from dask-ms[zarr])
  Using cached python_casacore-3.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting zarr<3.0.0,>=2.12.0 (from dask-ms[zarr])
  Using cached zarr-2.17.2-py3-none-any.whl.metadata (5.7 kB)
Collecting click>=8.1 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting cloudpickle>=1.5.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached cloudpickle-3.0.0-py3-none-any.whl.metadata (7.0 kB)
Collecting fsspec>=2021.09.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached fsspec-2024.3.1-py3-none-any.whl.metadata (6.8 kB)
Collecting packaging>=20.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting partd>=1.2.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached partd-1.4.1-py3-none-any.whl.metadata (4.6 kB)
Collecting pyyaml>=5.3.1 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting toolz>=0.10.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached toolz-0.12.1-py3-none-any.whl.metadata (5.1 kB)
Collecting importlib-metadata>=4.13.0 (from dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached importlib_metadata-7.1.0-py3-none-any.whl.metadata (4.7 kB)
Collecting numpy>=1.21 (from dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting six (from python-casacore<4.0.0,>=3.5.1->dask-ms[zarr])
  Using cached six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting asciitree (from zarr<3.0.0,>=2.12.0->dask-ms[zarr])
  Using cached asciitree-0.3.3-py3-none-any.whl
Collecting numcodecs>=0.10.0 (from zarr<3.0.0,>=2.12.0->dask-ms[zarr])
  Using cached numcodecs-0.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.8 kB)
Collecting fasteners (from zarr<3.0.0,>=2.12.0->dask-ms[zarr])
  Using cached fasteners-0.19-py3-none-any.whl.metadata (4.9 kB)
Collecting zipp>=0.5 (from importlib-metadata>=4.13.0->dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached zipp-3.18.1-py3-none-any.whl.metadata (3.5 kB)
Collecting locket (from partd>=1.2.0->dask>=2023.1.1->dask[array]>=2023.1.1->dask-ms[zarr])
  Using cached locket-1.0.0-py2.py3-none-any.whl.metadata (2.8 kB)
Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Using cached dask-2024.5.0-py3-none-any.whl (1.2 MB)
Using cached python_casacore-3.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (35.3 MB)
Using cached zarr-2.17.2-py3-none-any.whl (208 kB)
Using cached dask_ms-0.2.20-py3-none-any.whl (138 kB)
Using cached click-8.1.7-py3-none-any.whl (97 kB)
Using cached cloudpickle-3.0.0-py3-none-any.whl (20 kB)
Using cached fsspec-2024.3.1-py3-none-any.whl (171 kB)
Using cached importlib_metadata-7.1.0-py3-none-any.whl (24 kB)
Using cached numcodecs-0.12.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB)
Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Using cached packaging-24.0-py3-none-any.whl (53 kB)
Using cached partd-1.4.1-py3-none-any.whl (18 kB)
Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Using cached toolz-0.12.1-py3-none-any.whl (56 kB)
Using cached fasteners-0.19-py3-none-any.whl (18 kB)
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Using cached zipp-3.18.1-py3-none-any.whl (8.2 kB)
Using cached locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Installing collected packages: asciitree, appdirs, zipp, toolz, six, pyyaml, packaging, numpy, locket, fsspec, fasteners, cloudpickle, click, python-casacore, partd, numcodecs, importlib-metadata, donfig, zarr, dask, dask-ms
Successfully installed appdirs-1.4.4 asciitree-0.3.3 click-8.1.7 cloudpickle-3.0.0 dask-2024.5.0 dask-ms-0.2.20 donfig-0.7.0 fasteners-0.19 fsspec-2024.3.1 importlib-metadata-7.1.0 locket-1.0.0 numcodecs-0.12.1 numpy-1.26.4 packaging-24.0 partd-1.4.1 python-casacore-3.5.2 pyyaml-6.0.1 six-1.16.0 toolz-0.12.1 zarr-2.17.2 zipp-3.18.1

Also, how was the zarr dataset created and how are you trying to open it? At present, there's an undocumented and subject to change converter that converts v2 Measurement Sets to a Measurement Set-like zarr format:

(venv) simon@simon-t14:~/tmp$ dask-ms convert ~/data/C147_unflagged.MS/ -f zarr -o C147.zarr
...

Which should then be accessible using (also undocumented and subject to change) xds_from_storage_ms:

(venv) simon@simon-t14:~/tmp$ python -c "from daskms import xds_from_storage_ms; print(xds_from_storage_ms('C147.zarr'))"
[<daskms.dataset.Dataset object at 0x7ad582ce4fd0>]
JSKenyon commented 6 months ago

It should also be noted that some packages in your pip output appear to be outside the virtual environment @tmolteno. That can indicate something unwholesome with paths, or using --system-site-packages, which is best avoided where possible.

tmolteno commented 6 months ago

Yep, I'm using --system-site-packages. I do this by default as non x86-64 systems like 64-bit arm often do not build from wheels, or build at all (astropy here's looking at you). The zarr is from here  (BUCKET=s3://ratt-public-data DATA=ESO137/ms1_primary_subset.zarr)

I did find the converter yesterday (as it's mentioned in a few bug reports), but haven't played with that yet :)

Anyway, useful for requirements to specify min versions where possible

sjperkins commented 6 months ago

Anyway, useful for requirements to specify min versions where possible

Its true that dask-ms doesn't specify fsspec as a direct dependency: t's a transitive dependency included via dask, currently specified as fsspec >= 2021.09.0 in master.

https://github.com/dask/dask/blob/bc6f42b867cdf9d415485a844fc7fa53c64f32c2/pyproject.toml#L35

Yep, I'm using --system-site-packages. I do this by default as non x86-64 systems like 64-bit arm often do not build from wheels, or build at all (astropy here's looking at you).

Ah, that makes things a bit more understandable -- I do tend to take the "install it in a virtual environment" line so I don't have much experience with hybrid installs.

Also, as @JSKenyon correctly spotted, it's finding your system install of fsspec and, since it falls within the bounds specified by dask, that version is considered acceptable.

So this report is to suggest that the fsspec requirement should be specified. I think something greater than 2022.7 release would do (based on a rudimentary git blame on fsspec).

Thanks for investigating here. As dask-ms does have it's own DaskMSStore class that uses fsspec to reason about paths, I'm leaning towards versioning fsspec in dask-ms.