Open JukkaL opened 7 years ago
This is an interesting idea. Here is some early bikeshedding:
TaggedUnion
?Enum
-like) nested class syntax:class Shape(TaggedUnion):
class Circle:
radius: float
class Rectangle:
width: float
height: float
EDIT: Fixed typo pointed by @kirbyfan64
What about using a (more
Enum
-like) nested class syntax:
But isn't Shape
undefined for the duration of its body evaluation?
On topic: +10000 to this. ADTs are awesome. :D
But isn't Shape undefined for the duration of its body evaluation?
Ah, sorry, that is just a typo, it is not needed.
What about calling this a TaggedUnion?
Hmm.. it would be less clear that the subclasses are named tuples (we want the default __init__
, unpacking, etc.). Also, there wouldn't be any extra tags required for these unions, as all objects have a type 'tag' by default. But I agree that the name NamedTupleUnion
isn't exactly pretty.
What about using a (more Enum-like) nested class syntax:
This would require using qualified names such as Shape.Circle
in isinstance
tests, which would be somewhat clumsy.
This would require using qualified names such as Shape.Circle in isinstance tests, which would be somewhat clumsy.
This is true. On the other hand the definition is a bit more compact, and it is immediately visible later in code that Circle
is a member of Shape
union.
But I agree that the name NamedTupleUnion isn't exactly pretty
NamedUnion
is a bit shorter and reminds NamedTuple
. Also there will be a parallel: Union
and Tuple
vs. NamedUnion
and NamedTuple
.
I would have liked the enum-like syntax (I have suggested it some time ago) but it is not exactly a tagged union, since Shape.Circle
is not Shape
, and sadly the syntax does not allow subclassing.
Here's a wild idea:
with NamedUnion as Shape:
class Circle(Shape):
radius : float
class Rectangle(Shape):
width: float
height: float
This way we have a bounded scope for the definition of subelements of Shape, and we can also use its name in the subclasses.
Please excuse the noob question, but how would this be different than the classes extending the Shape
protocol or abstract class?
The point is the declaration "A Shape
must be one of these types and nothing else" in a way that can be used by mypy (using the declaration syntax) and checked at runtime (using isinstance()
to distinguish between items).
Just another idea for the possible syntax. If we allow only named tuples a s members, then one can write:
class Shape(NamedUnion):
Circle: {radius: int}
Rectangle: {width: int, height: int}
At runtime, this will be transformed so that Shape.Circle
will be a NamedTuple
. As well __bases__
could be patched so that issubclass(Shape.Circle, Shape)
will return True
. Also some other facilities could be added, making NamedUnion
a more advanced version of Enum
.
@elazarg I really like your syntax proposition. It can also be used to mimic Haskell's type constraints:
with NamedUnion['Shape', constraints] as Shape:
class Circle(Shape):
radius : float
class Rectangle(Shape):
width: float
height: float
NamedUnion[name, constraints]
resembles TypeVar
call signature. We can actually subclass from constraints to further mimic Haskell's deriving
:
class NamedUnion(...):
...
def __class_getitem__(cls, params):
name, constraints = params
class T(*constraints):
pass
T.__name__ = name
T.__qualname__ = name
return T
In my sum type library I went with syntax like this:
class Shape(NamedUnion):
def Circle(radius: float): ...
def Rectangle(width: float, height: float): ...
It's similar to a Haskell GADT declaration in syntax and semantics¹ – there's only one class/type, Shape
, and you specify the signatures of its constructors², Circle
and Rectangle
.
¹ as close as I could get in Python anyway 😄
² modulo a cls
parameter at the front – they get turned into classmethods
(I hope this doesn't sound too much like a plug, just wanted to contribute to the discussion!)
- If you forget to handle a case, mypy often won't complain about this. Catching these errors is often mentioned as one of the nicest things about algebraic data types.
Since the canonical answer to the "where is the switch/case statement in python?" seems to be "use a dict", I think TypedDict
would be perfect here as It would allow to catch missing cases.
Sadly mypy does not allow for creating TypedDict
(or NamedTuple
) types dynamically. That means that it is not possible to pull out the types from a regular Union
e.g. Union[str, int]
and and create a suitable TypedDict
with all the cases from it e.g. TypedDict['Cases', {str: Callable[[str],...], int: Callable[[int], ...]}]
.
See https://github.com/python/mypy/issues/4128 and https://github.com/python/mypy/issues/848 for reference
class Shape(TaggedUnion): class Circle: radius: float class Rectangle: width: float height: float
This syntax actually already kind of works today. The following type checks and runs:
from abc import ABC
from typing import Union, NamedTuple
class Shape(ABC):
class Circle(NamedTuple):
radius: float
class Rectangle(NamedTuple):
width: float
height: float
ShapeType = Union[Shape.Circle, Shape.Rectangle]
S: ShapeType = Shape.Circle(2.0)
S = Shape.Rectangle(2.0, 1.4)
print(S.width)
def print_width(shape: ShapeType) -> None:
if isinstance(shape, Shape.Rectangle):
print(shape.width)
print_width(S)
The only ugly thing is that you have to define the Union
manually and have to give it a different name.
@thomkeh Yes, the point of special support would be to minimize verbosity, maximize clarity and simplicity. Here you have to juggle with both a base type(useless in itself, except for namespacing the subclasses) and an additional type alias.
Personally, I've found myself naturally wanting and trying to emulate this nested class syntax. I think it just makes sense visually.
And I like having subclasses available in the base class' namespace.
I think it would be intuitive for anyone used to working with enum.Enum
.
It's good to know that sum types can be partly emulated today. But one of the key features of algebraic datatypes is that they can be recursive (e.g. data List a = Nil | Cons a List
) . I don't see how to apply the pattern above to recursive datatypes.
Adding support for recursive types is on the roadmap for mypy.
@dckc Using lazy annotations, I don't see any syntactic issue.
from __future__ import annotations
import typing
class List(TaggedUnion):
class Cons:
head: typing.Any
tail: List
class Empty: pass
One issue is the incompatibility between typing.NamedTuple
and typing.Generic
.
Will this issue be resolved using https://www.python.org/dev/peps/pep-0622/#sealed-classes-as-algebraic-data-types ?
Another question (maybe not directly related to mypy...) : what would be recommended to serialize in JSON, since there is no "tag"? Adding the class name? But the class name may be an implementation detail that may change.
@dfroger PEP 622 was withdrawn; the final version of pattern matching didn't have sealed. However, it's a natural future extension.
This isn't the right place to discuss JSON serialization.
Using the new PEP 695 syntax and pattern matching you can express algebraic data types in a very nice and concise way.
For product types you can use dataclasses.
For sum types, here's the way I propose.
Either
from dataclasses import dataclass
type Either[L, R] = Left[L] | Right[R]
@dataclass(frozen=True)
class Left[T]:
value: T
@dataclass(frozen=True)
class Right[T]:
value: T
def handle(either: Either[int, str]) -> Either[int, str]:
match either:
case Left(number):
return Left(number + 1)
case Right(string):
return Right(string.lower())
# Optionally, you can rename Either, Left and Right
# to better document your code
type Result[T, E] = Either[T, E]
Ok = Left
Error = Right
def str_to_int(string: str) -> Result[int, Exception]:
try:
return Ok(int(string))
except Exception as error:
return Error(error)
List
from dataclasses import dataclass
type List[T] = Empty | Constructor[T]
@dataclass(frozen=True)
class Empty:
pass
@dataclass(frozen=True)
class Constructor[T]:
head: T
tail: List[T]
def append[T](list: List[T], item: T) -> List[T]:
match list:
case Empty():
return Constructor(item, Empty())
case Constructor(head, tail):
return Constructor(head, append(tail, item))
Bool
from dataclasses import dataclass
type Bool = Yes | No
@dataclass(frozen=True)
class Yes:
pass
@dataclass(frozen=True)
class No:
pass
def both(first: Bool, second: Bool) -> Bool:
match first, second:
case Yes(), Yes():
return Yes()
case _:
return No()
Personally, I think it's not worth it to overcomplicate things by adding a new special syntax or extending the standard library, as it's already fine with the current version of Python.
There's also a Scott encoding way of representing algebraic data types. (If you feel real nerdy today)
Option
from collections.abc import Callable
from typing import Protocol
class Option[T](Protocol):
def __call__[R](self, case_nothing: R, case_some: Callable[[T], R], /) -> R:
...
def nothing[T, R](case_nothing: R, case_some: Callable[[T], R]) -> R:
return case_nothing
def some[T](value: T) -> Option[T]:
def option[R](case_nothing: R, case_some: Callable[[T], R]) -> R:
return case_some(value)
return option
def default_to_0(number: Option[int]) -> int:
return number(0, identity)
def identity[T](value: T) -> T:
return value
Number
from __future__ import annotations
from collections.abc import Callable
from typing import Protocol
class Number(Protocol):
def __call__[R](self, case_zero: R, case_successor: Callable[[Number], R], /) -> R:
...
def zero[R](case_zero: R, case_successor: Callable[[Number], R]) -> R:
return case_zero
def successor(number: Number) -> Number:
def new_number[R](case_zero: R, case_successor: Callable[[Number], R]) -> R:
return case_successor(number)
return new_number
def add(augend: Number, addend: Number) -> Number:
return augend(addend, lambda augend_predessor: successor(add(augend_predessor, addend)))
Many languages support algebraic data types, including Haskell, OCaml, Scala, Swift and Rust. Many programmers like them and they have some benefits over Python subclassing, which I won't discuss in detail here.
Currently we can kind of fake algebraic data types by using union types and named tuples:
There are a few problems with this currently:
For (1), mypy could detect at least some undefined variables and recognize (some) if statements used for "pattern matching". For example, assume that we added a new shape,
Triangle
. Now mypy should complain aboutarea()
, as it fails to handle triangles (if we'd have used multiple returns, mypy could already have caught the error):For (2), (3) and (4), we could have some new syntax:
NamedTupleUnion
would have a few special features. All subclasses are named tuples and must be defined in the same module as theNamedTupleUnion
type (i.e. it can't be extended outside the current module). This way we can check whether all item types are handled usingisinstance
checks. Finally, only one level of inheritance would be supported, for simplicity -- though I'm not sure if this restriction is necessary.[This is just a random idea I wanted to write down and it's not urgent in any way.]