Better alternative for algebraic data types

JukkaL commented 7 years ago

Many languages support algebraic data types, including Haskell, OCaml, Scala, Swift and Rust. Many programmers like them and they have some benefits over Python subclassing, which I won't discuss in detail here.

Currently we can kind of fake algebraic data types by using union types and named tuples:

class Circle:
    radius: float

class Rectangle:
    width: float
    height: float

Shape = Union[Circle, Rectangle]

def area(s: Shape) -> float:
    if isinstance(s, Circle):
        result = math.pi * s.radius**2
    elif isinstance(s, Rectangle):
        result = s.width * s.height
    return result

There are a few problems with this currently:

If you forget to handle a case, mypy often won't complain about this. Catching these errors is often mentioned as one of the nicest things about algebraic data types.
You need to write each item type name twice, once in the class definition and once in the union type definition. This is somewhat error-prone.
The union type definition needs to come after the item types, which feels backwards to me. (If we fixed forward references to types in type aliases, it might help a bit.)
Mypy doesn't keep track of the name of the union type alias internally, and in error messages it will just print out the entire union. This could be awkward since the union can have many items.

For (1), mypy could detect at least some undefined variables and recognize (some) if statements used for "pattern matching". For example, assume that we added a new shape, Triangle. Now mypy should complain about area(), as it fails to handle triangles (if we'd have used multiple returns, mypy could already have caught the error):

def area(s: Shape) -> float:
    if isinstance(s, Circle):
        result = math.pi * s.radius**2
    elif isinstance(s, Rectangle):
        result = s.width * s.height
    return result   # Error: "result" may be undefined (union item "Triangle" not handled)

For (2), (3) and (4), we could have some new syntax:

from mypy_extensions import NamedTupleUnion

class Shape(NamedTupleUnion):  # This would be roughly similar to original union type
    pass

class Circle(Shape):
    radius: float

class Rectangle(Shape):
    width: float
    height: float

NamedTupleUnion would have a few special features. All subclasses are named tuples and must be defined in the same module as the NamedTupleUnion type (i.e. it can't be extended outside the current module). This way we can check whether all item types are handled using isinstance checks. Finally, only one level of inheritance would be supported, for simplicity -- though I'm not sure if this restriction is necessary.

[This is just a random idea I wanted to write down and it's not urgent in any way.]

ilevkivskyi commented 7 years ago

This is an interesting idea. Here is some early bikeshedding:

What about calling this a TaggedUnion?
What about using a (more Enum-like) nested class syntax:

class Shape(TaggedUnion):
    class Circle:
        radius: float

    class Rectangle:
        width: float
        height: float

EDIT: Fixed typo pointed by @kirbyfan64

refi64 commented 7 years ago

What about using a (more Enum-like) nested class syntax:

But isn't Shape undefined for the duration of its body evaluation?

On topic: +10000 to this. ADTs are awesome. :D

ilevkivskyi commented 7 years ago

But isn't Shape undefined for the duration of its body evaluation?

Ah, sorry, that is just a typo, it is not needed.

JukkaL commented 7 years ago

What about calling this a TaggedUnion?

Hmm.. it would be less clear that the subclasses are named tuples (we want the default __init__, unpacking, etc.). Also, there wouldn't be any extra tags required for these unions, as all objects have a type 'tag' by default. But I agree that the name NamedTupleUnion isn't exactly pretty.

What about using a (more Enum-like) nested class syntax:

This would require using qualified names such as Shape.Circle in isinstance tests, which would be somewhat clumsy.

ilevkivskyi commented 7 years ago

This would require using qualified names such as Shape.Circle in isinstance tests, which would be somewhat clumsy.

This is true. On the other hand the definition is a bit more compact, and it is immediately visible later in code that Circle is a member of Shape union.

But I agree that the name NamedTupleUnion isn't exactly pretty

NamedUnion is a bit shorter and reminds NamedTuple. Also there will be a parallel: Union and Tuple vs. NamedUnion and NamedTuple.

elazarg commented 7 years ago

I would have liked the enum-like syntax (I have suggested it some time ago) but it is not exactly a tagged union, since Shape.Circle is not Shape, and sadly the syntax does not allow subclassing.

Here's a wild idea:

with NamedUnion as Shape:
    class Circle(Shape):
        radius : float

    class Rectangle(Shape):
        width: float
        height: float

This way we have a bounded scope for the definition of subelements of Shape, and we can also use its name in the subclasses.

datnamer commented 7 years ago

Please excuse the noob question, but how would this be different than the classes extending the Shape protocol or abstract class?

elazarg commented 7 years ago

The point is the declaration "A Shape must be one of these types and nothing else" in a way that can be used by mypy (using the declaration syntax) and checked at runtime (using isinstance() to distinguish between items).

ilevkivskyi commented 7 years ago

Just another idea for the possible syntax. If we allow only named tuples a s members, then one can write:

class Shape(NamedUnion):
    Circle: {radius: int}
    Rectangle: {width: int, height: int}

At runtime, this will be transformed so that Shape.Circle will be a NamedTuple. As well __bases__ could be patched so that issubclass(Shape.Circle, Shape) will return True. Also some other facilities could be added, making NamedUnion a more advanced version of Enum.

grayfall commented 6 years ago

@elazarg I really like your syntax proposition. It can also be used to mimic Haskell's type constraints:

with NamedUnion['Shape', constraints] as Shape:
    class Circle(Shape):
        radius : float

    class Rectangle(Shape):
        width: float
        height: float

NamedUnion[name, constraints] resembles TypeVar call signature. We can actually subclass from constraints to further mimic Haskell's deriving:

class NamedUnion(...):
    ...
    def __class_getitem__(cls, params):
        name, constraints = params
        class T(*constraints):
            pass

        T.__name__ = name
        T.__qualname__ = name
        return T

lubieowoce commented 6 years ago

In my sum type library I went with syntax like this:

class Shape(NamedUnion):
    def Circle(radius: float): ...
    def Rectangle(width: float, height: float): ...

It's similar to a Haskell GADT declaration in syntax and semantics¹ – there's only one class/type, Shape, and you specify the signatures of its constructors², Circle and Rectangle.

¹ as close as I could get in Python anyway 😄 ² modulo a cls parameter at the front – they get turned into classmethods

(I hope this doesn't sound too much like a plug, just wanted to contribute to the discussion!)

stereobutter commented 5 years ago

If you forget to handle a case, mypy often won't complain about this. Catching these errors is often mentioned as one of the nicest things about algebraic data types.

Since the canonical answer to the "where is the switch/case statement in python?" seems to be "use a dict", I think TypedDict would be perfect here as It would allow to catch missing cases.

Sadly mypy does not allow for creating TypedDict (or NamedTuple) types dynamically. That means that it is not possible to pull out the types from a regular Union e.g. Union[str, int] and and create a suitable TypedDict with all the cases from it e.g. TypedDict['Cases', {str: Callable[[str],...], int: Callable[[int], ...]}].

See https://github.com/python/mypy/issues/4128 and https://github.com/python/mypy/issues/848 for reference

tmke8 commented 5 years ago

class Shape(TaggedUnion):
    class Circle:
        radius: float

    class Rectangle:
        width: float
        height: float

This syntax actually already kind of works today. The following type checks and runs:

from abc import ABC
from typing import Union, NamedTuple

class Shape(ABC):
    class Circle(NamedTuple):
        radius: float

    class Rectangle(NamedTuple):
        width: float
        height: float

ShapeType = Union[Shape.Circle, Shape.Rectangle]

S: ShapeType = Shape.Circle(2.0)
S = Shape.Rectangle(2.0, 1.4)
print(S.width)

def print_width(shape: ShapeType) -> None:
    if isinstance(shape, Shape.Rectangle):
        print(shape.width)

print_width(S)

The only ugly thing is that you have to define the Union manually and have to give it a different name.

DrPyser commented 4 years ago

@thomkeh Yes, the point of special support would be to minimize verbosity, maximize clarity and simplicity. Here you have to juggle with both a base type(useless in itself, except for namespacing the subclasses) and an additional type alias.

Personally, I've found myself naturally wanting and trying to emulate this nested class syntax. I think it just makes sense visually. And I like having subclasses available in the base class' namespace. I think it would be intuitive for anyone used to working with enum.Enum.

dckc commented 4 years ago

It's good to know that sum types can be partly emulated today. But one of the key features of algebraic datatypes is that they can be recursive (e.g. data List a = Nil | Cons a List) . I don't see how to apply the pattern above to recursive datatypes.

gvanrossum commented 4 years ago

Adding support for recursive types is on the roadmap for mypy.

DrPyser commented 4 years ago

@dckc Using lazy annotations, I don't see any syntactic issue.

from __future__ import annotations
import typing

class List(TaggedUnion):
    class Cons:
        head: typing.Any
        tail: List

    class Empty: pass

One issue is the incompatibility between typing.NamedTuple and typing.Generic.

dfroger commented 3 years ago

Will this issue be resolved using https://www.python.org/dev/peps/pep-0622/#sealed-classes-as-algebraic-data-types ?

dfroger commented 3 years ago

Another question (maybe not directly related to mypy...) : what would be recommended to serialize in JSON, since there is no "tag"? Adding the class name? But the class name may be an implementation detail that may change.

JelleZijlstra commented 3 years ago

@dfroger PEP 622 was withdrawn; the final version of pattern matching didn't have sealed. However, it's a natural future extension.

This isn't the right place to discuss JSON serialization.

LeeeeT commented 6 months ago

Using the new PEP 695 syntax and pattern matching you can express algebraic data types in a very nice and concise way.

For product types you can use dataclasses.

For sum types, here's the way I propose.

Either

from dataclasses import dataclass

type Either[L, R] = Left[L] | Right[R]

@dataclass(frozen=True)
class Left[T]:
    value: T

@dataclass(frozen=True)
class Right[T]:
    value: T

def handle(either: Either[int, str]) -> Either[int, str]:
    match either:
        case Left(number):
            return Left(number + 1)
        case Right(string):
            return Right(string.lower())

# Optionally, you can rename Either, Left and Right
# to better document your code

type Result[T, E] = Either[T, E]

Ok = Left
Error = Right

def str_to_int(string: str) -> Result[int, Exception]:
    try:
        return Ok(int(string))
    except Exception as error:
        return Error(error)

List

from dataclasses import dataclass

type List[T] = Empty | Constructor[T]

@dataclass(frozen=True)
class Empty:
    pass

@dataclass(frozen=True)
class Constructor[T]:
    head: T
    tail: List[T]

def append[T](list: List[T], item: T) -> List[T]:
    match list:
        case Empty():
            return Constructor(item, Empty())
        case Constructor(head, tail):
            return Constructor(head, append(tail, item))

Bool

from dataclasses import dataclass

type Bool = Yes | No

@dataclass(frozen=True)
class Yes:
    pass

@dataclass(frozen=True)
class No:
    pass

def both(first: Bool, second: Bool) -> Bool:
    match first, second:
        case Yes(), Yes():
            return Yes()
        case _:
            return No()

Personally, I think it's not worth it to overcomplicate things by adding a new special syntax or extending the standard library, as it's already fine with the current version of Python.

There's also a Scott encoding way of representing algebraic data types. (If you feel real nerdy today)

Option

from collections.abc import Callable
from typing import Protocol

class Option[T](Protocol):
    def __call__[R](self, case_nothing: R, case_some: Callable[[T], R], /) -> R:
        ...

def nothing[T, R](case_nothing: R, case_some: Callable[[T], R]) -> R:
    return case_nothing

def some[T](value: T) -> Option[T]:
    def option[R](case_nothing: R, case_some: Callable[[T], R]) -> R:
        return case_some(value)

    return option

def default_to_0(number: Option[int]) -> int:
    return number(0, identity)

def identity[T](value: T) -> T:
    return value

Number

from __future__ import annotations

from collections.abc import Callable
from typing import Protocol

class Number(Protocol):
    def __call__[R](self, case_zero: R, case_successor: Callable[[Number], R], /) -> R:
        ...

def zero[R](case_zero: R, case_successor: Callable[[Number], R]) -> R:
    return case_zero

def successor(number: Number) -> Number:
    def new_number[R](case_zero: R, case_successor: Callable[[Number], R]) -> R:
        return case_successor(number)

    return new_number

def add(augend: Number, addend: Number) -> Number:
    return augend(addend, lambda augend_predessor: successor(add(augend_predessor, addend)))

python / mypy

Better alternative for algebraic data types #2464