pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.6k stars 1.08k forks source link

netCDF4: support for structured arrays as attribute values; serialize as "compound types" #2868

Open naught101 opened 5 years ago

naught101 commented 5 years ago

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

ds.attrs = dict(a=dict(b=2))
ds.to_netcdf(outfile)

...

~/miniconda3/envs/ana/lib/python3.6/site-packages/xarray/backends/api.py in check_attr(name, value)
    158                             'a string, an ndarray or a list/tuple of '
    159                             'numbers/strings for serialization to netCDF '
--> 160                             'files'.format(value))
    161 
    162     # Check attrs on the dataset itself

TypeError: Invalid value for attr: {'b': 2} must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files

Problem description

I'm not entirely sure if this should be possible, but it seems like it should be from this email: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg10502.html

Nested attributes would be nice as a way to namespace metadata.

Expected Output

Netcdf with nested global attributes.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-16-lowlatency machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.0 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.4.3.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: None setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.3.1 IPython: 7.3.0 sphinx: None
shoyer commented 5 years ago

netCDF4-Python doesn't seem to support this:

In [1]: import netCDF4

In [2]: f = netCDF4.Dataset('test.nc', 'w')

In [3]: f.foo = {'bar': 2}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-45a6eb241d31> in <module>()
----> 1 f.foo = {'bar': 2}

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__setattr__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.setncattr()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._set_att()

TypeError: illegal data type for attribute b'foo', must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got O
arkanoid87 commented 2 years ago

It is supporter in netcdf4

Compound types should be used. See: https://github.com/Unidata/netcdf4-python/issues/905

dcherian commented 2 years ago

From this comment it seems like the value should be a numpy structured array instead of a dict. Can you try that instead?