zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.45k stars 273 forks source link

Return TypedDict instances from Metadata.to_dict() #2099

Open ahmadjiha opened 3 weeks ago

ahmadjiha commented 3 weeks ago

This is a partial implementation for zarr-developers/zarr-python#1773.

So far, I have done the following:

  1. Defined TypedDict classes for the metadata models that have a well-typed dict representation
  2. Set the return value of the the model's to_dict() method to the relevant TypedDict

Closes zarr-developers/zarr-python#1773

TODO:

ahmadjiha commented 1 week ago

I experimented with making the Metadata base class generic and ran into issues with bounding the generic type with dict[str, JSON].

Making the Metadata class generic was easy enough:

# src/zarr/abc/metadata.py

...

T = TypeVar("T", bound=dict[str, JSON])

@dataclass(frozen=True)
class Metadata(Generic[T]):
    def to_dict(self) -> T:
        ...

However, I ran into issues with the TypedDict classes being outside of the bounds of the generic type. For example:

# src/zarr/core/chunk_key_encodings.py

...

class ChunkKeyEncodingDict(TypedDict):
    """A dictionary representing a chunk key encoding configuration."""

    name: str
    configuration: dict[Literal["separator"], SeparatorLiteral]

@dataclass(frozen=True)
class ChunkKeyEncoding(Metadata[ChunkKeyEncodingDict]):  # Type argument "ChunkKeyEncodingDict" of "Metadata" must be a subtype of "dict[str, JSON]"
    name: str
    separator: SeparatorLiteral = "."

It seems like we need a way to bound the TypedDict value types at the dictionary level. I've tried quite a few things -- I have yet to come across a solution.

Is it possible that this is not supported? Would appreciate any guidance @jhamman @d-v-b

d-v-b commented 1 week ago

T = TypeVar("T", bound=dict[str, JSON])

I think the problem is specifying the type bound to be dict[str, JSON]. First, typeddicts are supposed to be immutable, so they are not a subclass of dict. Mapping should be used instead.

Second, for some reason Mapping[str, JSON] won't work, but Mapping[str, object] does work as a type bound. I don't really know the answer to this one, hopefully someone can explain it.

Here's an example that I got working:

from abc import abstractmethod
from dataclasses import dataclass
from typing import Any, Generic, TypeVar, Mapping, TypedDict

T = TypeVar('T', bound=Mapping[str, object])

class Meta(Generic[T]):

    @abstractmethod
    def to_dict(self) -> T:
        ...

class ExampleSpec(TypedDict):
    a: str
    b: str

@dataclass
class Example(Meta[ExampleSpec]):
    a: str
    b: str

    def to_dict(self) -> ExampleSpec:
        return {'a': self.a, 'b': self.b}

x = Example(a='10', b='10')
y = x.to_dict()
reveal_type(y)
"""
Revealed type is "TypedDict('tessel.ExampleSpec', {'a': builtins.str, 'b': builtins.str})"
"""