Open normanrz opened 8 months ago
@normanrz - I like this suggestion. Perhaps you could expand on what the serialization/deserialization flow might look like. In the proposal to basically unpack the JSON object by key in a sequential pattern like this:
zarr_json = json.loads(...)
if zarr_json["node_type"] == "array":
...
if zarr_json["chunk_grid"]["name"] == "regular":
chunk_grid = RegularChunkGrid(chunk_shape=zarr_json["chunk_grid"]["configuration"]["chunk_shape"])
array_meta = ArrayMetadata(..., chunk_grid=chunk_grid, ...)
I think it could work via delegation from the parent classes:
class ChunkGrid:
@classmethod
def from_dict(cls, val: Dict[str, Any]) -> ChunkGrid:
assert isinstance(val, dict)
name = val["name"]
if name == "regular":
return RegularChunkGrid.from_dict(val)
else:
raise NotImplementedError
class RegularChunkGrid(ChunkGrid):
@classmethod
def from_dict(cls, val: Dict[str, Any]) -> RegularChunkGrid:
assert isinstance(val, dict)
assert val["name"] == "regular"
assert _validate_int_list(val["configuration"]["chunk_shape"])
return cls(chunk_shape=tuple(val["configuration"]["chunk_shape"]))
...
zarr_json = json.loads(...)
if zarr_json["node_type"] == "array":
...
chunk_grid = ChunkGrid.from_dict(zarr_json["chunk_grid"])
array_meta = ArrayMetadata(..., chunk_grid=chunk_grid, ...)
With cattrs we had nice input validation including reasonably understandable error messages. We need to decide if we want to add that now or later.
In the future this could be extended so that ChunkGrid.from_dict
uses an extensible registry for different chunk grid variants.
@d-v-b you refactored this quite a bit. Would you agree to consider this done?
I think we still have work to do on this, but we can work on it after the alpha release
In the current V3 draft version, all metadata classes are closely modeled after the json representation. For example, the regular chunk grid is represented as this json object:
In Python, these objects are implemented as 2 classes
This is an artifact of using cattrs for (de)serialization. However, we were thinking of dropping the attrs dependency. That would open the way for a more ergonomic class design that serializes to the same json structure.