zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.5k stars 278 forks source link

Error when using attribute keys with mixed types #1037

Closed malmans2 closed 2 years ago

malmans2 commented 2 years ago

Minimal, reproducible code sample, a copy-pastable example if possible

import zarr
z1 = zarr.open('data/example.zarr', mode='w', shape=(10, 10), dtype='i4')
z1.attrs.put({1: "bar"})  # OK
z1.attrs.put({"foo": "bar"})  # OK
z1.attrs.put({1: "bar", "foo": "bar"})  # Error
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 z1.attrs.put({1: "bar", "foo": "bar"})

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:109, in Attributes.put(self, d)
    106 def put(self, d):
    107     """Overwrite all attributes with the key/value pairs in the provided dictionary
    108     `d` in a single operation."""
--> 109     self._write_op(self._put_nosync, d)

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:73, in Attributes._write_op(self, f, *args, **kwargs)
     71 # synchronization
     72 if self.synchronizer is None:
---> 73     return f(*args, **kwargs)
     74 else:
     75     with self.synchronizer[self.key]:

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/attrs.py:112, in Attributes._put_nosync(self, d)
    111 def _put_nosync(self, d):
--> 112     self.store[self.key] = json_dumps(d)
    113     if self.cache:
    114         self._cached_asdict = d

File ~/mambaforge/envs/xarray/lib/python3.9/site-packages/zarr/util.py:38, in json_dumps(o)
     36 def json_dumps(o: Any) -> bytes:
     37     """Write JSON in a consistent, human-readable way."""
---> 38     return json.dumps(o, indent=4, sort_keys=True, ensure_ascii=True,
     39                       separators=(',', ': ')).encode('ascii')

File ~/mambaforge/envs/xarray/lib/python3.9/json/__init__.py:234, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    232 if cls is None:
    233     cls = JSONEncoder
--> 234 return cls(
    235     skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    236     check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237     separators=separators, default=default, sort_keys=sort_keys,
    238     **kw).encode(obj)

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:201, in JSONEncoder.encode(self, o)
    199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
--> 201     chunks = list(chunks)
    202 return ''.join(chunks)

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    429     yield from _iterencode_list(o, _current_indent_level)
    430 elif isinstance(o, dict):
--> 431     yield from _iterencode_dict(o, _current_indent_level)
    432 else:
    433     if markers is not None:

File ~/mambaforge/envs/xarray/lib/python3.9/json/encoder.py:353, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
    351 first = True
    352 if _sort_keys:
--> 353     items = sorted(dct.items())
    354 else:
    355     items = dct.items()

TypeError: '<' not supported between instances of 'str' and 'int'

Problem description

I'm not able to assign attributes using keys with different types. Looks like the issue is coming from json that is trying to sort the keys. I guess there are 3 options:

Version and installation information

conda env export name: xarray channels: - conda-forge dependencies: - affine=2.3.1=pyhd8ed1ab_0 - aiobotocore=2.3.2=pyhd8ed1ab_0 - aiohttp=3.8.1=py39h63b48b0_1 - aioitertools=0.10.0=pyhd8ed1ab_0 - aiosignal=1.2.0=pyhd8ed1ab_0 - antlr-python-runtime=4.7.2=py39h6e9494a_1003 - appdirs=1.4.4=pyh9f0ad1d_0 - appnope=0.1.3=pyhd8ed1ab_0 - asciitree=0.3.3=py_2 - asttokens=2.0.5=pyhd8ed1ab_0 - async-timeout=4.0.2=pyhd8ed1ab_0 - attrs=21.4.0=pyhd8ed1ab_0 - backcall=0.2.0=pyh9f0ad1d_0 - backports=1.0=py_2 - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0 - blosc=1.21.1=h97e831e_3 - boost-cpp=1.74.0=hdbf7018_7 - boto3=1.21.21=pyhd8ed1ab_0 - botocore=1.24.21=pyhd8ed1ab_1 - bottleneck=1.3.4=py39h86b5767_1 - brotli=1.0.9=h5eb16cf_7 - brotli-bin=1.0.9=h5eb16cf_7 - brotlipy=0.7.0=py39h63b48b0_1004 - bzip2=1.0.8=h0d85af4_4 - c-ares=1.18.1=h0d85af4_0 - ca-certificates=2022.5.18.1=h033912b_0 - cached-property=1.5.2=hd8ed1ab_1 - cached_property=1.5.2=pyha770c72_1 - cairo=1.16.0=h9e0e54b_1010 - cartopy=0.20.2=py39hd74d15e_4 - cdat_info=8.2.1=pyhd8ed1ab_2 - cdms2=3.1.5=py39ha4b5fb4_14 - cdtime=3.1.4=py39hef3cfc1_7 - certifi=2022.5.18.1=py39h6e9494a_0 - cf-units=3.0.1=py39hc89836e_2 - cffi=1.15.0=py39he338e87_0 - cfgrib=0.9.10.1=pyhd8ed1ab_0 - cfgv=3.3.1=pyhd8ed1ab_0 - cfitsio=4.1.0=h2c97ad1_0 - cftime=1.6.0=py39h86b5767_1 - charset-normalizer=2.0.12=pyhd8ed1ab_0 - click=8.1.3=py39h6e9494a_0 - click-plugins=1.1.1=py_0 - cligj=0.7.2=pyhd8ed1ab_1 - cloudpickle=2.1.0=pyhd8ed1ab_0 - coverage=6.4=py39h701faf5_0 - cryptography=37.0.2=py39h9c2a9ce_0 - curl=7.83.1=h372c54d_0 - cycler=0.11.0=pyhd8ed1ab_0 - cytoolz=0.11.2=py39h63b48b0_2 - dask-core=2022.5.1=pyhd8ed1ab_0 - decorator=5.1.1=pyhd8ed1ab_0 - distarray=2.12.2=pyhd8ed1ab_2 - distlib=0.3.4=pyhd8ed1ab_0 - distributed=2022.5.1=pyhd8ed1ab_0 - eccodes=2.26.0=hf772d56_0 - esmf=8.2.0=nompi_h92c19ad_0 - esmpy=8.2.0=nompi_py39h8ffa011_1 - execnet=1.9.0=pyhd8ed1ab_0 - executing=0.8.3=pyhd8ed1ab_0 - expat=2.4.8=h96cf925_0 - fasteners=0.17.3=pyhd8ed1ab_0 - filelock=3.7.0=pyhd8ed1ab_0 - findlibs=0.0.2=pyhd8ed1ab_0 - flox=0.5.3=pyhd8ed1ab_0 - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 - font-ttf-inconsolata=3.000=h77eed37_0 - font-ttf-source-code-pro=2.038=h77eed37_0 - font-ttf-ubuntu=0.83=hab24e00_0 - fontconfig=2.14.0=h676cef8_0 - fonts-conda-ecosystem=1=0 - fonts-conda-forge=1=0 - fonttools=4.33.3=py39h701faf5_0 - freetype=2.10.4=h4cff582_1 - freexl=1.0.6=h0d85af4_0 - frozenlist=1.3.0=py39h63b48b0_1 - fsspec=2022.5.0=pyhd8ed1ab_0 - future=0.18.2=py39h6e9494a_5 - g2clib=1.6.3=h7284f3b_1 - geos=3.10.2=he49afe7_0 - geotiff=1.7.1=had63758_1 - gettext=0.19.8.1=hd1a6beb_1008 - giflib=5.2.1=hbcb3906_2 - h5netcdf=1.0.0=pyhd8ed1ab_0 - h5py=3.6.0=nompi_py39hbc6cb89_100 - hdf4=4.2.15=hefd3b78_3 - hdf5=1.12.1=nompi_ha60fbc9_104 - heapdict=1.0.1=py_0 - hypothesis=6.46.8=pyhd8ed1ab_0 - icu=69.1=he49afe7_0 - identify=2.5.1=pyhd8ed1ab_0 - idna=3.3=pyhd8ed1ab_0 - importlib-metadata=4.11.4=py39h6e9494a_0 - importlib_metadata=4.11.4=hd8ed1ab_0 - importlib_resources=5.7.1=pyhd8ed1ab_1 - iniconfig=1.1.1=pyh9f0ad1d_0 - ipython=8.3.0=py39h6e9494a_0 - iris=3.2.1=pyhd8ed1ab_0 - jasper=2.0.33=h013e400_0 - jedi=0.18.1=py39h6e9494a_1 - jinja2=3.1.2=pyhd8ed1ab_0 - jmespath=1.0.0=pyhd8ed1ab_0 - jpeg=9e=h5eb16cf_1 - json-c=0.16=h01d06f9_0 - jsonschema=4.5.1=pyhd8ed1ab_0 - jupyter_core=4.10.0=py39h6e9494a_0 - kealib=1.4.14=ha22a8b1_3 - kiwisolver=1.4.2=py39h7248d28_1 - krb5=1.19.3=hb49756b_0 - lazy-object-proxy=1.7.1=py39h63b48b0_1 - lcms2=2.12=h577c468_0 - lerc=3.0=he49afe7_0 - libaec=1.0.6=he49afe7_0 - libblas=3.9.0=14_osx64_openblas - libbrotlicommon=1.0.9=h5eb16cf_7 - libbrotlidec=1.0.9=h5eb16cf_7 - libbrotlienc=1.0.9=h5eb16cf_7 - libcblas=3.9.0=14_osx64_openblas - libcdms=3.1.2=h7d8e03a_117 - libcf=1.0.3=py39h8d2e2c5_114 - libcurl=7.83.1=h372c54d_0 - libcxx=14.0.3=hc203e6f_0 - libdap4=3.20.6=h3e144a0_2 - libdeflate=1.10=h0d85af4_0 - libdrs=3.1.2=h102a9a4_118 - libdrs_f=3.1.2=h2534672_114 - libedit=3.1.20191231=h0678c8f_2 - libev=4.33=haf1e3a3_1 - libffi=3.4.2=h0d85af4_5 - libgdal=3.4.3=h6474519_0 - libgfortran=5.0.0=9_3_0_h6c81a4c_23 - libgfortran5=9.3.0=h6c81a4c_23 - libglib=2.70.2=hf1fb8c0_4 - libiconv=1.16=haf1e3a3_0 - libkml=1.3.0=h8fd9edb_1014 - liblapack=3.9.0=14_osx64_openblas - libllvm10=10.0.1=h009f743_3 - libnetcdf=4.8.1=nompi_h6609ca0_102 - libnghttp2=1.47.0=h942079c_0 - libopenblas=0.3.20=openmp_hb3cd9ec_0 - libpng=1.6.37=h7cec526_2 - libpq=14.2=hea3049e_0 - librttopo=1.1.0=hec60dd8_9 - libspatialite=5.0.1=hadde3e2_15 - libssh2=1.10.0=h52ee1ee_2 - libtiff=4.3.0=hfca7e8f_4 - libuuid=2.32.1=h35c211d_1000 - libwebp=1.2.2=h28dabe5_0 - libwebp-base=1.2.2=h0d85af4_1 - libxcb=1.13=h0d85af4_1004 - libxml2=2.9.12=h7e28ab6_1 - libxslt=1.1.33=h1acebb3_3 - libzip=1.8.0=h8b0c345_1 - libzlib=1.2.11=h6c3fc93_1014 - llvm-openmp=14.0.3=ha654fa7_0 - llvmlite=0.36.0=py39h798a4f4_0 - locket=1.0.0=pyhd8ed1ab_0 - lxml=4.8.0=py39h63b48b0_2 - lz4-c=1.9.3=he49afe7_1 - markupsafe=2.1.1=py39h63b48b0_1 - matplotlib-base=3.5.2=py39h64a0072_0 - matplotlib-inline=0.1.3=pyhd8ed1ab_0 - msgpack-python=1.0.3=py39h7248d28_1 - multidict=6.0.2=py39h63b48b0_1 - munkres=1.1.4=pyh9f0ad1d_0 - nbformat=5.4.0=pyhd8ed1ab_0 - nc-time-axis=1.4.1=pyhd8ed1ab_0 - ncurses=6.3=h96cf925_1 - netcdf-fortran=4.5.4=nompi_h9ed14b0_100 - netcdf4=1.5.8=nompi_py39he7d1c46_101 - nodeenv=1.6.0=pyhd8ed1ab_0 - nspr=4.32=hcd9eead_1 - nss=3.78=ha8197d3_0 - numba=0.53.1=py39h32e38f5_1 - numcodecs=0.9.1=py39h9fcab8e_2 - numexpr=2.8.0=py39hbd61c47_2 - numpy=1.22.4=py39h677350a_0 - numpy_groupies=0.9.16=pyhd8ed1ab_0 - openblas=0.3.20=openmp_h5ad848b_0 - openjpeg=2.4.0=h6e7aa92_1 - openssl=1.1.1o=hfe4f2af_0 - packaging=21.3=pyhd8ed1ab_0 - pandas=1.4.2=py39hbd61c47_1 - parso=0.8.3=pyhd8ed1ab_0 - partd=1.2.0=pyhd8ed1ab_0 - patsy=0.5.2=pyhd8ed1ab_0 - pcre=8.45=he49afe7_0 - pexpect=4.8.0=pyh9f0ad1d_2 - pickleshare=0.7.5=py_1003 - pillow=9.1.1=py39h579eac4_0 - pint=0.19.2=pyhd8ed1ab_0 - pip=22.1.1=pyhd8ed1ab_0 - pixman=0.40.0=hbcb3906_0 - platformdirs=2.5.1=pyhd8ed1ab_0 - pluggy=1.0.0=py39h6e9494a_3 - pooch=1.6.0=pyhd8ed1ab_0 - poppler=22.04.0=h39497b0_0 - poppler-data=0.4.11=hd8ed1ab_0 - postgresql=14.2=he8fe76e_0 - pre-commit=2.19.0=py39h6e9494a_0 - proj=9.0.0=h2364a93_1 - prompt-toolkit=3.0.29=pyha770c72_0 - pseudonetcdf=3.2.2=pyhd8ed1ab_0 - psutil=5.9.1=py39h701faf5_0 - pthread-stubs=0.4=hc929b4f_1001 - ptyprocess=0.7.0=pyhd3deb0d_0 - pure_eval=0.2.2=pyhd8ed1ab_0 - py=1.11.0=pyh6c4a22f_0 - pycparser=2.21=pyhd8ed1ab_0 - pygments=2.12.0=pyhd8ed1ab_0 - pyopenssl=22.0.0=pyhd8ed1ab_0 - pyparsing=3.0.9=pyhd8ed1ab_0 - pyproj=3.3.1=py39hfe197cf_0 - pyrsistent=0.18.1=py39h63b48b0_1 - pyshp=2.3.0=pyhd8ed1ab_0 - pysocks=1.7.1=py39h6e9494a_5 - pytest=7.1.2=py39h6e9494a_0 - pytest-cov=3.0.0=pyhd8ed1ab_0 - pytest-env=0.6.2=py_0 - pytest-forked=1.4.0=pyhd8ed1ab_0 - pytest-xdist=2.5.0=pyhd8ed1ab_0 - python=3.9.12=h8b4d769_1_cpython - python-dateutil=2.8.2=pyhd8ed1ab_0 - python-eccodes=1.4.0=py39h86b5767_1 - python-fastjsonschema=2.15.3=pyhd8ed1ab_0 - python-xxhash=3.0.0=py39h63b48b0_1 - python_abi=3.9=2_cp39 - pytz=2022.1=pyhd8ed1ab_0 - pyyaml=6.0=py39h63b48b0_4 - rasterio=1.2.10=py39hc9cfaa8_5 - readline=8.1=h05e3726_0 - requests=2.27.1=pyhd8ed1ab_0 - s3transfer=0.5.2=pyhd8ed1ab_0 - scipy=1.8.1=py39hfa1a3ab_0 - seaborn=0.11.2=hd8ed1ab_0 - seaborn-base=0.11.2=pyhd8ed1ab_0 - setuptools=62.3.2=py39h6e9494a_0 - shapely=1.8.2=py39hd471b09_1 - six=1.16.0=pyh6c4a22f_0 - snappy=1.1.9=h6e38e02_1 - snuggs=1.4.7=py_0 - sortedcontainers=2.4.0=pyhd8ed1ab_0 - sparse=0.13.0=pyhd8ed1ab_0 - sqlite=3.38.5=hd9f0692_0 - stack_data=0.2.0=pyhd8ed1ab_0 - statsmodels=0.13.2=py39hc89836e_0 - tblib=1.7.0=pyhd8ed1ab_0 - tiledb=2.8.3=hf17dc58_1 - tk=8.6.12=h5dbffcc_0 - toml=0.10.2=pyhd8ed1ab_0 - tomli=2.0.1=pyhd8ed1ab_0 - toolz=0.11.2=pyhd8ed1ab_0 - tornado=6.1=py39h63b48b0_3 - traitlets=5.2.1.post0=pyhd8ed1ab_0 - typing-extensions=4.2.0=hd8ed1ab_1 - typing_extensions=4.2.0=pyha770c72_1 - tzcode=2022a=h5eb16cf_0 - tzdata=2022a=h191b570_0 - udunits2=2.2.28=h06ef574_0 - ukkonen=1.0.1=py39h7248d28_2 - unicodedata2=14.0.0=py39h63b48b0_1 - urllib3=1.26.9=pyhd8ed1ab_0 - virtualenv=20.14.1=py39h6e9494a_0 - wcwidth=0.2.5=pyh9f0ad1d_2 - wheel=0.37.1=pyhd8ed1ab_0 - wrapt=1.14.1=py39h701faf5_0 - xerces-c=3.2.3=h6564042_4 - xorg-libxau=1.0.9=h35c211d_0 - xorg-libxdmcp=1.1.3=h35c211d_0 - xxhash=0.8.0=h35c211d_3 - xz=5.2.5=haf1e3a3_1 - yaml=0.2.5=h0d85af4_2 - yarl=1.7.2=py39h63b48b0_2 - zarr=2.11.3=pyhd8ed1ab_0 - zict=2.2.0=pyhd8ed1ab_0 - zipp=3.8.0=pyhd8ed1ab_0 - zlib=1.2.11=h6c3fc93_1014 - zstd=1.5.2=ha9df2e0_1 - pip: - numbagg==0.2.1 prefix: /Users/mattia/mambaforge/envs/xarray
joshmoore commented 2 years ago

see:

I think I'd tend towards "Only strings are supported" but another sub-option would be to print a warning and automatically stringify.

clbarnes commented 2 years ago

Agree on only supporting strings, much easier for everyone.

joshmoore commented 2 years ago

@clbarnes (or anyone else): and thoughts on completely disallowing versus stringifying arguments?

clbarnes commented 2 years ago

Strong preference for documenting/ annotating that keys must be strings (and that values must be json-serialisable), and not stringifying non-string keys (which would be unpredictable for different types).

JavaScript gets pretty confusing with its string coercion, no need to open ourselves up to that. Python's got a whole lot better to develop in now there's more focus on sane types. Explicitly checking for key string-ness won't check for the JSON-validity the value (which might be a dict with non-string keys), so it's probably best to just leave it all up to the serialiser.

joshmoore commented 2 years ago

@clbarnes, objection noted. I guess I'm most worried about the impact of the breaking change since at the moment a single update (z.attrs[1] = "foo") gets you:

/tmp $cat test.zarr/.zattrs
{
    "1": "foo"
}
clbarnes commented 2 years ago

Maybe a couple of versions of noisy deprecation warnings, then? Even if it never actually gets patched out until v3.

joshmoore commented 2 years ago

Thanks all. Rolling @malmans2's #1066 into 2.13 but leaving this open for the deprecation cycle.

joshmoore commented 2 years ago

Strike that see #1125