zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.53k stars 286 forks source link

2.18.1: pytest fails with `numcodecs` 0.12.1 #1891

Open kloczek opened 6 months ago

kloczek commented 6 months ago

Zarr version

2.18.1

Numcodecs version

0.12.1

Python Version

3.10.14

Operating System

Linux

Installation

From autogenerated from git tag tar ball.

Description

Looks like zarr test suite needs to be updated for latest numcodecs 0.12.1.

Steps to reproduce

I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

Here is pytest output: ```console + PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-zarr-2.18.1-2.fc37.x86_64/usr/lib64/python3.10/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-zarr-2.18.1-2.fc37.x86_64/usr/lib/python3.10/site-packages + /usr/bin/pytest -ra -m 'not network' ==================================================================================== test session starts ==================================================================================== platform linux -- Python 3.10.14, pytest-8.1.1, pluggy-1.4.0 rootdir: /home/tkloczko/rpmbuild/BUILD/zarr-python-2.18.1 configfile: pyproject.toml collected 2621 items / 2 errors ========================================================================================== ERRORS =========================================================================================== _________________________________________________________________________ ERROR collecting zarr/tests/test_core.py __________________________________________________________________________ ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/zarr-python-2.18.1/zarr/tests/test_core.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /usr/lib64/python3.10/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) zarr/tests/test_core.py:31: in from numcodecs.tests.common import greetings E ModuleNotFoundError: No module named 'numcodecs.tests' _________________________________________________________________________ ERROR collecting zarr/tests/test_sync.py __________________________________________________________________________ ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/zarr-python-2.18.1/zarr/tests/test_sync.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /usr/lib64/python3.10/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) zarr/tests/test_sync.py:20: in from zarr.tests.test_core import TestArray zarr/tests/test_core.py:31: in from numcodecs.tests.common import greetings E ModuleNotFoundError: No module named 'numcodecs.tests' ================================================================================== short test summary info ================================================================================== ERROR zarr/tests/test_core.py ERROR zarr/tests/test_sync.py !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ===================================================================================== 2 errors in 3.79s ===================================================================================== ```

Additional output

List of installed modules in build env: ```console Package Version ----------------------------- ----------- alabaster 0.7.16 asciitree 0.3.3 Babel 2.15.0 build 1.2.1 charset-normalizer 3.3.2 defusedxml 0.7.1 docutils 0.20.1 exceptiongroup 1.1.3 idna 3.7 imagesize 1.4.1 importlib_metadata 7.1.0 iniconfig 2.0.0 installer 0.7.0 Jinja2 3.1.4 MarkupSafe 2.1.5 msgpack 1.0.8 numcodecs 0.12.1 numpy 1.26.5 numpydoc 1.7.0 packaging 24.0 pluggy 1.4.0 Pygments 2.18.0 pyproject_hooks 1.0.0 pytest 8.1.1 python-dateutil 2.9.0.post0 requests 2.31.0 setuptools 69.4.0 setuptools-scm 8.1.0 snowballstemmer 2.2.0 Sphinx 7.3.7 sphinx-automodapi 0.17.0 sphinx-copybutton 0.5.2 sphinx_design 0.5.0 sphinx-issues 3.0.1 sphinxcontrib-applehelp 1.0.8 sphinxcontrib-devhelp 1.0.6 sphinxcontrib-htmlhelp 2.0.5 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.7 sphinxcontrib-serializinghtml 1.1.10 tabulate 0.9.0 tokenize_rt 5.2.0 tomli 2.0.1 urllib3 2.2.1 wheel 0.43.0 zipp 3.18.2 ```

Please let me know if you need more details or want me to perform some diagnostics.

kloczek commented 6 months ago

BTW it could be another issue as well https://github.com/zarr-developers/zarr-python/issues/1891

toloudis commented 4 months ago

I've seen this issue also. I'm having to pin zarr < 2.18. In my case it's a problem with a dask array not passing a is_ndarray_like test deep inside set_items... https://github.com/bioio-devs/bioio/actions/runs/9812320870

joshmoore commented 4 months ago

@madsbk, does this ring a bell?

FAILED bioio/tests/writers/test_ome_zarr_writer_2.py::test_write_ome_zarr[e.zarr-shape0-3-scaling0-da_random_from_shape] - TypeError: memoryview: a bytes-like object is required, not 'Array'

        if not is_ndarray_like(buf):
            if isinstance(buf, array.array) and buf.typecode in "cu":
                # Guard condition, do not support array.array with unicode type, this is
                # problematic because numpy does not support it on all platforms. Also do not
                # support char as it was removed in Python 3.
                raise TypeError("array.array with char or unicode type is not supported")
            else:
                # N.B., first take a memoryview to make sure that we subsequently create a
                # numpy array from a memory buffer with no copy
>               mem = memoryview(buf)
E               TypeError: memoryview: a bytes-like object is required, not 'Array'

from https://github.com/zarr-developers/numcodecs/commit/bedb8b0981388b1a2292c0d78e044bd36c2aacd8

madsbk commented 4 months ago

Sorry no, I am not able to reproduce any of the issues locally :/

If I could reproduce, I would check the type of buf and see why is_ndarray_like(buf) is False. If the type name is 'Array', I would guess is_ndarray_like(buf) should be True

dstansby commented 1 month ago

Is this still an issue? I think we're running tests with the latest version of numcodecs on CI, which don't seem to be failing.

kloczek commented 1 month ago

Reported issue still is around.

It is yet another issue. zarr/tests/ content is added to .whl archive.

+ /usr/bin/python3 -sBm build -w --no-isolation
* Getting build dependencies for wheel...
running egg_info
creating zarr.egg-info
writing zarr.egg-info/PKG-INFO
[..]
creating build/lib/zarr/tests
copying zarr/tests/__init__.py -> build/lib/zarr/tests
copying zarr/tests/conftest.py -> build/lib/zarr/tests
copying zarr/tests/test_attrs.py -> build/lib/zarr/tests
copying zarr/tests/test_convenience.py -> build/lib/zarr/tests
copying zarr/tests/test_core.py -> build/lib/zarr/tests
copying zarr/tests/test_creation.py -> build/lib/zarr/tests
copying zarr/tests/test_dim_separator.py -> build/lib/zarr/tests
copying zarr/tests/test_filters.py -> build/lib/zarr/tests
copying zarr/tests/test_hierarchy.py -> build/lib/zarr/tests
copying zarr/tests/test_indexing.py -> build/lib/zarr/tests
copying zarr/tests/test_info.py -> build/lib/zarr/tests
copying zarr/tests/test_meta.py -> build/lib/zarr/tests
copying zarr/tests/test_meta_array.py -> build/lib/zarr/tests
copying zarr/tests/test_n5.py -> build/lib/zarr/tests
copying zarr/tests/test_storage.py -> build/lib/zarr/tests
copying zarr/tests/test_storage_v3.py -> build/lib/zarr/tests
copying zarr/tests/test_sync.py -> build/lib/zarr/tests
copying zarr/tests/test_util.py -> build/lib/zarr/tests
copying zarr/tests/util.py -> build/lib/zarr/tests
[..]
adding 'zarr/tests/__init__.py'
adding 'zarr/tests/conftest.py'
adding 'zarr/tests/test_attrs.py'
adding 'zarr/tests/test_convenience.py'
adding 'zarr/tests/test_core.py'
adding 'zarr/tests/test_creation.py'
adding 'zarr/tests/test_dim_separator.py'
adding 'zarr/tests/test_filters.py'
adding 'zarr/tests/test_hierarchy.py'
adding 'zarr/tests/test_indexing.py'
adding 'zarr/tests/test_info.py'
adding 'zarr/tests/test_meta.py'
adding 'zarr/tests/test_meta_array.py'
adding 'zarr/tests/test_n5.py'
adding 'zarr/tests/test_storage.py'
adding 'zarr/tests/test_storage_v3.py'
adding 'zarr/tests/test_sync.py'
adding 'zarr/tests/test_util.py'
adding 'zarr/tests/util.py'

Probably easiest way to solve that would be probably move zarr/tests/ to tests/.

arkanoid87 commented 4 weeks ago

hitting this right now when using xarray map_blocks with dask (local cluster, 1 worker, 1 thread)

zarr==2.18.3
numcodecs==0.13.1
dask==2024.10.0
dask-expr==1.1.16
rioxarray==0.17.0
xarray==2024.9.0
...
{
    "name": "TypeError",
    "message": "memoryview: a bytes-like object is required, not 'Array'",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[11], line 13
     10 print(f\"Processing row slice: {row_slice}\")
     12 # preproc_block_ds.isel(y=row_slice).to_zarr(ZARR_PATH / f\"T32TMK_3D_rows.{start_row}.{end_row}.zarr\", mode=\"w\", consolidated=True)
---> 13 preproc_block_ds.isel(x=slice(512*14, 512*15), y=slice(512*5, 512*6)).to_zarr(ZARR_PATH / f\"T32TMK_3D_rows.{start_row}.{end_row}.zarr\", mode=\"w\")

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/xarray/core/dataset.py:2562, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs)
   2415 \"\"\"Write dataset contents to a zarr group.
   2416 
   2417 Zarr chunks are determined in the following way:
   (...)
   2558     The I/O user guide, with more details and examples.
   2559 \"\"\"
   2560 from xarray.backends.api import to_zarr
-> 2562 return to_zarr(  # type: ignore[call-overload,misc]
   2563     self,
   2564     store=store,
   2565     chunk_store=chunk_store,
   2566     storage_options=storage_options,
   2567     mode=mode,
   2568     synchronizer=synchronizer,
   2569     group=group,
   2570     encoding=encoding,
   2571     compute=compute,
   2572     consolidated=consolidated,
   2573     append_dim=append_dim,
   2574     region=region,
   2575     safe_chunks=safe_chunks,
   2576     zarr_version=zarr_version,
   2577     write_empty_chunks=write_empty_chunks,
   2578     chunkmanager_store_kwargs=chunkmanager_store_kwargs,
   2579 )

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/xarray/backends/api.py:1785, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs)
   1783 # TODO: figure out how to properly handle unlimited_dims
   1784 dump_to_store(dataset, zstore, writer, encoding=encoding)
-> 1785 writes = writer.sync(
   1786     compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs
   1787 )
   1789 if compute:
   1790     _finalize_store(writes, zstore)

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/xarray/backends/common.py:268, in ArrayWriter.sync(self, compute, chunkmanager_store_kwargs)
    265 if chunkmanager_store_kwargs is None:
    266     chunkmanager_store_kwargs = {}
--> 268 delayed_store = chunkmanager.store(
    269     self.sources,
    270     self.targets,
    271     lock=self.lock,
    272     compute=compute,
    273     flush=True,
    274     regions=self.regions,
    275     **chunkmanager_store_kwargs,
    276 )
    277 self.sources = []
    278 self.targets = []

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py:249, in DaskManager.store(self, sources, targets, **kwargs)
    241 def store(
    242     self,
    243     sources: Any | Sequence[Any],
    244     targets: Any,
    245     **kwargs: Any,
    246 ) -> Any:
    247     from dask.array import store
--> 249     return store(
    250         sources=sources,
    251         targets=targets,
    252         **kwargs,
    253     )

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/dask/array/core.py:1233, in store(***failed resolving arguments***)
   1231 elif compute:
   1232     store_dsk = HighLevelGraph(layers, dependencies)
-> 1233     compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
   1234     return None
   1236 else:

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/dask/base.py:397, in compute_as_if_collection(cls, dsk, keys, scheduler, get, **kwargs)
    395 schedule = get_scheduler(scheduler=scheduler, cls=cls, get=get)
    396 dsk2 = optimization_function(cls)(dsk, keys, **kwargs)
--> 397 return schedule(dsk2, keys, **kwargs)

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/distributed/client.py:3483, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   3481         should_rejoin = False
   3482 try:
-> 3483     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   3484 finally:
   3485     for f in futures.values():

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/distributed/client.py:2556, in Client.gather(self, futures, errors, direct, asynchronous)
   2553     local_worker = None
   2555 with shorten_traceback():
-> 2556     return self.sync(
   2557         self._gather,
   2558         futures,
   2559         errors=errors,
   2560         direct=direct,
   2561         local_worker=local_worker,
   2562         asynchronous=asynchronous,
   2563     )

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/zarr/core.py:1447, in __setitem__()
   1445     self.vindex[selection] = value
   1446 elif is_pure_orthogonal_indexing(pure_selection, self.ndim):
-> 1447     self.set_orthogonal_selection(pure_selection, value, fields=fields)
   1448 else:
   1449     self.set_basic_selection(pure_selection, value, fields=fields)

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/zarr/core.py:1636, in set_orthogonal_selection()
   1633 # setup indexer
   1634 indexer = OrthogonalIndexer(selection, self)
-> 1636 self._set_selection(indexer, value, fields=fields)

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/zarr/core.py:1988, in _set_selection()
   1985                 chunk_value = chunk_value[item]
   1987         # put data
-> 1988         self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
   1989 else:
   1990     lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/zarr/core.py:2261, in _chunk_setitem()
   2258     lock = self._synchronizer[ckey]
   2260 with lock:
-> 2261     self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, fields=fields)

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/zarr/core.py:2271, in _chunk_setitem_nosync()
   2269     self._chunk_delitem(ckey)
   2270 else:
-> 2271     self.chunk_store[ckey] = self._encode_chunk(cdata)

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/zarr/core.py:2390, in _encode_chunk()
   2387         chunk = f.encode(chunk)
   2389 # check object encoding
-> 2390 if ensure_ndarray_like(chunk).dtype == object:
   2391     raise RuntimeError(\"cannot write object array without object codec\")
   2393 # compress

File ~/fireurisk_safe/venv/lib/python3.11/site-packages/numcodecs/compat.py:40, in ensure_ndarray_like()
     36     raise TypeError(\"array.array with char or unicode type is not supported\")
     37 else:
     38     # N.B., first take a memoryview to make sure that we subsequently create a
     39     # numpy array from a memory buffer with no copy
---> 40     mem = memoryview(buf)
     41     # instantiate array from memoryview, ensures no copy
     42     buf = np.array(mem, copy=False)

TypeError: memoryview: a bytes-like object is required, not 'Array'"