zarr-developers / numcodecs

A Python package providing buffer compression and transformation codecs for use in data storage and communication applications.
http://numcodecs.readthedocs.io
MIT License
125 stars 87 forks source link

Encoding and Decoding Compound Arrays Does not work with JSON Encoder/Decoder #518

Open mavaylon1 opened 6 months ago

mavaylon1 commented 6 months ago

I currently use the pickle encoder and decoder; however, I've been requested to have the encoding/decoding process be doable through JSON encoder/decoder.

When dealing with compound structures, i.e., structured arrays, I run into a dimension issue. Below reproduces the issue. Is this structure supported and if so is there something I need to change so that the input item is in a specific format?

Minimal, reproducible code sample, a copy-pastable example if possible

import numcodecs
import numpy as np

item = np.array([(1, 'dataset_1', {'source': '.', 'path': '/dataset_1', 'object_id': None, 'source_object_id': None}),
       (2, 'dataset_2', {'source': '.', 'path': '/dataset_2', 'object_id': None, 'source_object_id': None})],
      dtype=[('id', '<i4'), ('name', 'O'), ('reference', 'O')])
cs = numcodecs.JSON()

en = cs.encode(item)
out=cs.decode(en)

This returns:

 File "~./json.py", line 75, in decode
    dec[:] = items[:-2]
    ~~~^^^
ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1.

Version and installation information

mavaylon1 commented 5 months ago

Any ideas?

martindurant commented 5 months ago

The trouble is this line: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/json.py#L62

buf.dtype.str

is '|V20', for a dtype [('id', '<i4'), ('name', 'O'), ('reference', 'O')]. We should be encoding the complex type.