Protocols (a.k.a. structural subtyping) #11

Closed gvanrossum closed 5 years ago

In mypy's typing,py there's a neat feature called Protocol. Perhaps we should add this to the PEP. Or perhaps it's similar to a Python ABC?

It's meant to be used for structural subtyping. It is quite similar to ABCs, but neither is a replacement for the other. The details of how to use Protocol are still poorly defined, as the mypy type system does not know about structural subtyping yet. However, adding support for structural subtyping shouldn't be too difficult, once we figure out the semantics.

Here is an example of how Protocol could be used:

class SupportsFileno(Protocol):
    @abstractmethod
    def fileno(self) -> int: pass

def f(file: SupportsFileno) -> None:
    id = file.fileno()   # Okay
    ...

f(open('foo'))  # Okay, since file objects have fileno()!

There is no need to explicitly declare that file objects implement SupportsFileno (the type checker would infer this automatically). This is unlike ABCs, where base classes have to be explicitly defined/registered.

The current implementation of Protocol is just a proof of concept, as is the implementation of overload. A production implementation would probably have to be optimized, and there are probably corner cases that aren't handled yet.

Protocols look to me like a pretty reasonable way of implementing structural typing, which is definitely something that we're hoping to see in the PEP. This design is necessarily verbose; an in-line, anonymous definition using dictionaries, like

def f(file: {'fileno': Callable[[], int]}) -> None: ...

is a bit more concise when the protocol/structural type is small. This would conflict with the current proposal to use dictionaries in annotations to represent current non-type uses of annotations (https://github.com/ambv/typehinting/blob/master/pep-NNNN.txt#L194), however see #26 for thoughts on alternatives.

I haven't seen any evidence for protocol types being used/useful all over the place, so my current working hypothesis is that having a heavy-weight syntax for protocols only adds a trivial amount of syntactic overhead for the vast majority of programs.

However, the absence of evidence doesn't prove anything -- if somebody finds a few counterexamples I'm happy to change my mind. :)

Maybe we could have a look at some Go code and see how many small, throwaway structural types are used there? Of course, having convenient syntax for a feature may nudge programmers into using the feature more often.

ABCs have runtime instance checks so it's hard for the type checker to use them for structural sub-typing. If it could, that would be the simplest and most elegant solution. Otherwise, could we leave this out for now and look how we could make it work with existing protocol solutions like Zope interfaces?

Jukka, your typing.py has a Protocol class that is more complex than most other infrastructure, and it is used for those generic ABCs (e.g. Sized, Hashable) whose collections.abc counterpart does a runtime structural check. Does this mean that you have now implemented this in mypy? I kind of like the resulting syntax for defining a generic ABC that is supposed to use structural testing:

class Sized(Protocol):
    @abstractmethod
    def __len__(self) -> int: pass

class Container(Protocol[T]):
    @abstractmethod
    def __contains__(self, x) -> bool: pass

(Note that Sized is not generic, but Container is.)

Łukasz: I have no experience with zope interfaces. But perhaps they are easy enough to parse for a hypothetical static type checker, and they feel close enough to types/classes that we should try to support them? Can you sketch a simple example?

Quoting Łukasz in https://github.com/JukkaL/mypy/issues/539#issuecomment-68983617: """ As for explicit protocols, there are some competing (as in: not easily composable) standards here:

ABCs with @abstractmethod
Zope interfaces
https://pypi.python.org/pypi/characteristic/

etc. etc.

As ABCs are built-in, it seems natural to suggest that they should be used for defining interfaces. However, I understand that static analysis (so, the type checker) might not be able to process every abstract class in general. """

Guido, the machinery in mypy's typing.py has been there for a long time, but it was never fully implemented, and in particular, the type checker doesn't know anything about this. It shouldn't really be there -- everything should just use ABCs, until there is type system support for protocols.

I added a new task for removing Protocol: https://github.com/JukkaL/mypy/issues/552

If protocols (or something similar) will be included in the PEP, I'll update the above issue.

So we have multiple competing ways to spell protocols, but no way to type-check them statically. This feels like something we'll have to leave out of the PEP and come back to later in a separate PEP.

OK, let's remove Protocol from typing.py for now. We should come back to it in a separate PEP, especially that PEP 245 has been rejected and there are ABCs now.

As for Zope interfaces, an interface definition is easy to parse, except for invariants (runnable checks, similar to instancechecks in ABCs). We have to bear in mind that existing interface definitions may sometimes be dynamic, it's Python. That's fine, I think. What the type checker doesn't know, it assumes it's correct.

I see two issues with Zope interfaces:

arbitrary classes can be externally registered as implementing interfaces, just like ABCs can have external classes registered to them (so we'd have the same problems with this)
adaptation (yes, Zope interfaces implement PEP 246) and runtime adapt hooks

Closing as this is being left out for now.

Reopening this, as duck typing is important according to the BDFL-Delegate (Mark Shannon).

I'm writing here a bunch of related issues that came to mind. These are intended to start a discussion -- I try not to propose any course of action yet (though I have some opinions about some of these already).

I'm sure there are other issues not covered by these.

1) What to call these types?

We could call them at least duck types, protocols, interfaces or structural types. I'll call them protocols below, but I'm not specifically advocating any particular term.

2) How to define an interface?

Here are a few possible ways:

@protocol
class Sized: ...

class Size(Protocol): ...

class ISize(Protocol): ...  # Zope interface naming convention

3) How to define a method in a protocol?

A few possible ways:

def __len__(self) -> int: pass

def __len__(self) -> int: ...   # Literal ellipsis; semantically same as above

@abstractmethod
def __len__(self) -> int: pass

def __len__(self) -> int:
    raise NotImplementedError   # I'm not actually advocating this, but it's an option

4) How to define an attribute in a protocol?

Some ideas:

class Foo(...):
    a = ???  # type: Foo    # Annotation optional, not sure what the initializer should be

    b = ... # type: Foo    # As in stubs; literal ellipsis

    @property
    def c(self) -> 'Foo': pass

    @abstractproperty
    def d(self) -> 'Foo': pass

    e = typing.attribute('''docstring''')   # type: Foo   # Similar to Zope interfaces

We could also disallow recursive types (see below).

5) How to explicitly declare that a class conforms to a protocol?

We may want the type checker to verify that a class actually conform to a protocol. Some ideas for this:

class A(Sized):
    ...

@implements(Sized)
class A:
    ...

class A:
    implements(Sized)   # Inspired by Zope interfaces

Alternatively, this can always be implicit. In that case we can do something like this to force a check:

if False:
    _dummy = A()  # type: Sized   # Complain if A doesn't implement Sized

We also need to decide whether the subclass will inherit the method implementations defined in the body of the protocol class in case we use regular inheritance (assuming they aren't abstract but regular methods). If we use a class decorator, we'd probably don't inherit anything.

6) Should we support protocols extending protocols?

We can probably use the same approach for defining a subprotocol as in (5).

7) Should we support optional methods and attributes?

I think TypeScript supports the concept of optional methods. If we declare a method as optional, subtypes don't have to implement it, but if they implement, the signature should be compatible. This could also work with attributes, but coming up with a syntax may be tricky.

8) Should some ABCs in typing be protocols instead?

For example, maybe typing.Sized should be a protocol.

9) Should be support recursive protocols?

For example, when Foo was used in the definition of Foo in (4), it defined a recursive structural type. If we think that they are too tricky we can disallow them.

10) Should we support generic protocols?

For example, if typing.Iterable is a protocol, it would have to be generic.

11) Should these be interoperable with other similar implementations?

See Guido's remark above. Maybe we should interoperate with Zope interfaces (or just borrow the implementation and include it in typing), for example.

I'm sorry I brought up Zope Interfaces. I'll respond to the rest when I have a real keyboard.

On Monday, May 18, 2015, Jukka Lehtosalo notifications@github.com wrote:

I'm writing here a bunch of related issues that came to mind. These are intended to start a discussion -- I try not to propose any course of action yet (though I have some opinions about some of these already).

I'm sure there are other issues not covered by these.

1) What to call these types?

We could call them at least duck types, protocols, interfaces or structural types. I'll call them protocols below, but I'm not specifically advocating any particular term.

2) How to define an interface?

Here are a few possible ways:

@protocol class Sized: ...

class Size(Protocol): ...

class ISize(Protocol): ... # Zope interface naming convention

3) How to define a method in a protocol?

A few possible ways:

def len(self) -> int: pass

def len(self) -> int: ... # Literal ellipsis; semantically same as above

@abstractmethod def len(self) -> int: pass

def len(self) -> int: raise NotImplementedError # I'm not actually advocating this, but it's an option

4) How to define an attribute in a protocol?

Some ideas:

class Foo(...): a = ??? # type: Foo # Annotation optional, not sure what the initializer should be
b = ... # type: Foo    # As in stubs; literal ellipsis

@property
def c(self) -> 'Foo': pass

@abstractproperty
def d(self) -> 'Foo': pass

e = typing.attribute('''docstring''')   # type: Foo   # Similar to Zope interfaces
We could also disallow recursive types (see below).

5) How to explicitly declare that a class conforms to a protocol?

We may want the type checker to verify that a class actually conform to a protocol. Some ideas for this:

class A(Sized): ...

@implements(Sized) class A: ...

class A: implements(Sized) # Inspired by Zope interfaces

Alternatively, this can always be implicit. In that case we can do something like this to force a check:

if False: _dummy = A() # type: Sized # Complain if A doesn't implement Sized

We also need to decide whether the subclass will inherit the method implementations defined in the body of the protocol class in case we use regular inheritance (assuming they aren't abstract but regular methods). If we use a class decorator, we'd probably don't inherit anything.

6) Should we support protocols extending protocols?

We can probably use the same approach for defining a subprotocol as in (5).

7) Should we support optional methods and attributes?

I think TypeScript supports the concept of optional methods. If we declare a method as optional, subtypes don't have to implement it, but if they implement, the signature should be compatible. This could also work with attributes, but coming up with a syntax may be tricky.

8) Should some ABCs in typing be protocols instead?

For example, maybe typing.Sized should be a protocol.

9) Should be support recursive protocols?

For example, when Foo was used in the definition of Foo in (4), it defined a recursive structural type. If we think that they are too tricky we can disallow them.

10) Should we support generic protocols?

For example, if typing.Iterable is a protocol, it would have to be generic.

11) Should these be interoperable with other similar implementations?

See Guido's remark above. Maybe we should interoperate with Zope interfaces (or just borrow the implementation and include it in typing), for example.

— Reply to this email directly or view it on GitHub https://github.com/ambv/typehinting/issues/11#issuecomment-103327277.

--Guido van Rossum (on iPad)

Another open issue:

12) How should isinstance work?

We could disallow isinstance or check for the presence of attributes in the protocol (similar to the now-removed mypy's Protocol implementation).

IMO, following the same conventions as stubs would be easier and less confusing.

I promised offline to write up a proposal for structural subtyping. This is the first iteration of one. If we reach a consensus, I can write this into a PEP.

Motivation

Currently, typing defines ABCs for several common Python protocols such as Iterable and Sized. The problem with them is that a class has to be explicitly marked to support them, which is arguably unpythonic and unlike what you'd normally do in your non statically typed code. For example, this conforms to the current PEP:

from typing import Sized, Iterable, Iterator

class Bucket(Sized, Iterable[int]):
    ...
    def __len__(self) -> int: ...
    def __iter__(self) -> Iterator[int]: ...

My intention is that the above code could be written instead equivalently without explicit base classes in the class definition, and Bucket would still be implicitly considered a subtype of both Sized and Iterable[int] by using structural subtyping:

from typing import Iterator

class Bucket:
    ...
    def __len__(self) -> int: ...
    def __iter__(self) -> Iterator[int]: ...

As I mentioned earlier, there are many individual design decisions that we need to agree on. I'm proposing answers to many of them here.

1) What to call these types?

Let's call them protocols. The reason is that the term iterator protocol, for example, is widely understood in the community, and coming up with a new term for this concept in a statically typed context would just create confusion.

This has the drawback that the term 'protocol' becomes overloaded with two subtly different meanings: the first is the traditional, well-known but slightly fuzzy concept of protocols such as iterable; the second is the more explicitly defined concept of protocols in statically typed code (or more generally in code that just uses the typing module). I argue that the distinction isn't importat most of the time, and in other cases people can just add a qualifier such as "protocol classes" (for the new-style protocols) or "traditional/non-class/implicit protocols".

2) How to define and use a protocol?

There would be a new class typing.Protocol. If this is explicitly included in the the base class list, the class is a protocol. Here is a simple example:

from typing import Protocol

class SupportsClose(Protocol):
    def close(self) -> None: ...   # See 3) for more about the '...'.

Now if we define a class Resource with a close method that has a suitable signature, it would implicitly be a subtype of SupportsClose, since we'd use structural subtyping for protocol types:

class Resource:
    ...

    def close(self) -> None:
        self.file.close()
        self.lock.release()

Protocol types can be used in annotations, of course, and for type checking:

def close_all(things: Iterable[SupportsClose]) -> None:
    for t in things:
        t.close()

f = open('foo.txt')
r = Resource(...)
close_all([f, r])  # OK!
close_all([1])  # Error: 'int' has no 'close' method

Note that both the user-defined class Resource and the IO type (the return type of open) would be considered subtypes of SupportsClose because they provide a suitable close method.

If using the current typing module, our only option to implement the above example would be to use an ABC (or type Any, but that would compromise type checking). If we'd use an ABC, we'd have to explicitly register the fact that these types are related, and this generally difficult to do with library types as the type objects may be hidden deep in the implementation of the library. Besides, this would be uglier than how you'd actually write the code in straight, idiomatic dynamically typed Python. The code with a protocol class matches common Python conventions much better. It's also automatically extensible and works with additional, unrelated classes that happen to implement the required interface.

3) How to define a method in a protocol?

I propose that most of the regular rules for classes still apply to protocol classes (modulo a few exceptions that only apply to protocol classes). I'd like protocols to also be ABCs, so all of these would be valid within a protocol class:

# Variant 1
def __len__(self) -> int: ...

# Variant 2
def __len__(self) -> int: pass

# Variant 3
@abstractmethod
def __len__(self) -> int: pass

# Variant 4
def __len__(self): pass

# Variant 5
def __len__(self) -> int:
    return 0

# Variant 6
def __len__(self) -> int:
    raise NotImplementedError

For variants 1, 2 and 3, a type checker should probably always require an explicit implementation to be defined in a subclass that explicitly subclasses the protocol (see below for more about this), because the implementations return None which is not a valid return type for the method. For variants 4, 5 and 6, we can use the provided implementation as a default implementation. The default implementations won't be used if the subtype relationship is implicit and only via structural subtyping -- the semantics of inheritance won't be changed.

I also propose that a ... as the method body in a protocol type makes the method implicitly abstract. This would only be checked statically, and there won't be any runtime checking. The idea here is that most methods in a protocol will be abstract, and having to always use @abstractmethod is a little verbose and ugly, and has the issue of implicit None return types confusing things. This would be the recommended way of defining methods in a protocol that don't have an implementation, but the other approaches can be used for legacy code or if people just feel like it. The recommended syntax would mirror how methods are defined in stub files.

4) How to define a data attribute in a protocol?

Similar to 3), there will be multiple valid ways of defining data attributes (or properties). All of these will be valid:

class Foo(Protocol):
    a = ...  # type: int  # Variant 1
    b = 0  # Variant 2

    # Variant 3
    @property
    def c(self) -> int: ...

    # Variant 4
    @property
    def c(self) -> int:
        return 0

    # Variant 5
    @property
    def d(self) -> int:
        raise NotImplementedError

    # Variant 6
    @abstractproperty
    def e(self) -> int: ...

    # Variant 7
    @abstractproperty
    def f(self) -> int: pass

Also, properties with setters can also be defined. The first three variants would be the recommended ways of defining attributes or properties, but similar to 3), the others are possible and may be useful for supporting legacy code.

When using an ... initializer, @abstractproperty or pass/... as property body (and when the type does not include None), the data attribute is considered abstract and must always be explicitly implemented in a compatible class.

Attributes should not be defined in the body of a method by assignment via self. This restriction is a little arbitrary, but my idea is that the protocol class implementation is often not shared by subtypes so the interface should not depend on the default implementation. This is more of a style than a technical issue, as a type checker could infer attributes from assignment statements within methods as well.

When using the ... initializer, the ... initializer might leak into subclasses at runtime, which is unfortunate:

class A(Protocol):
    x = ...  # type: int

class B(A):
    def __init__(self) -> None:
        self.x = 1

b = B()
print(b.x)  # 1
print(B.x)  # Ellipsis

If we'd use a None initializer things wouldn't be any better. Maybe we can modify the metaclass to recognize ... initializers and translate them to something else. This needs to be documented, however. Also, in this proposal there is no way to distinguish between class and instance data attributes.

I'm not sure sure what to do with __init__. I guess a protocol could provide a default implementation that could be used in explicit subclasses.

Overall, I'm probably the least happy about this part of the proposal.

5) How to explicitly declare that a class conforms to a protocol?

I propose that protocols can be used as regular base classes. I can see at least three things that support this decision. First, a protocol class could define default implementations for some methods (typing.Sequence would be an example if we decide to turn it into a protocol). Second, we want a way of statically enforcing that a class actually implements a protocol correctly (but there are other ways to achieve this effect -- see below for alternatives). Third, this makes it possible to turn an existing ABC into a protocol and just have things (mostly) work. This would be important for the existing ABCs in typing what we may want to change into protocols (see point 8 for more about this). The general philosophy would be that Protocols are mostly like regular ABCs, but a static type checker will handle them somewhat specially.

Note that subclassing a protocol class would not turn the subclass into a protocol unless it also has Protocol as an explicit base class. I assume that we can use metaclass trickery to get this to work correctly.

Some terminology could be useful here for clarity. If a class includes a protocol in its MRO, the class is an (explicit) subclass of the procotol. If a class ia a structural subtype of a protocol, it is said to implement the protocol and to be compatible with a protocol. If a class is compatible with a protocol but the protocol is not included in the MRO, the class is an implicit subclass of the protocol.

We could also explicitly add an assignment for checking that a class implements a protocol. I've seen a similar pattern in some Go code that I've reviewed. Example:

class A:
    def __len__(self) -> float:
        return ...

_ = A()  # type: Sized  # Error: A.__len__ doesn't conform to 'Sized'
                        # (Incompatible return type 'float')

I don't much care above the above example, as it moves the check away from the class definition and it almost requires a comment as otherwise the code probably wouldn't make any sense to an average reader -- it looks like dead code. Besides, in the simplest form it requires us to construct an instance of A which could problematic if this requires accessing or allocating some resources such as files or sockets. We could work around the latter by using a cast, for example, but then the code would be really ugly.

6) Should we support protocols extending protocols?

I think that we should support subprotocols. A subprotocol can be defined by having both one or more protocols as the explicit base classes and also having typing.Protocol as an immediate base class:

from typing import Sized, Protocol

class SizedAndCloseable(Sized, Protocol):
    def close(self) -> None: ...

Now the protocol SizedAndCloseable is a protocol with two methods, __len__ and close. Alternatively, we could have implemented it like this, assuming the existence of SupportsClose from an earlier example:

from typing import Sized

class SupportsClose(...): ...  # Like above

class SizedAndCloseable(Sized, SupportsClose, Protocol):
    pass

The two definitions of SizedAndClosable would be equivalent. Subclass relationships between protocols aren't meaningful when considering subtyping, as we only use structural compatibility as the criterion, not the MRO.

If we omit Protocol in the base class list, this would be regular (non-protocol) class that must implement Sized. If Protocol is included in the base class list, all the other base classes must be protocols. A protocol can't extend a regular class.

7) Should we support optional attributes?

We can come up with examples where it would be handy to be able to say that a method or data attribute does not need to be present in a class implementing a protocol, but if it's present, it must conform to a specific signature or type. One could use a hasattr check to determine whether they can use the attribute on a particular instance.

In the interest of simplicity, let's not support optional methods or attributes. We can always revisit this later if there is an actual need. The current realistic potential use cases for protocols that I've seen don't require these. However, other languages have similar features and apparently they are pretty commonly used. If I remember correctly, at least TypeScript and Objective-C support a similar concept.

8) Should some ABCs in typing be protocols instead?

I think that at least these classes in typing should be protocols:

Sized
Container
Iterable
Iterator
Reversible
SupportsAbs (and other Supports* classes)

These classes are small and conceptually simple. It's easy to see which of these protocols a class implements from the presence of a small number of magic methods, which immediately suggest a protocol.

I'm not sure about other classes such as Sequence, Set and IO. I believe that these are sufficiently complex that it makes sense to require code to be explicit about them, as there would not be any sufficiently obvious and small set of 'marker' methods that tell that a class implements this protocol. Also, it's too easy to leave some part of the protocol/interface unimplemented by accident, and explicitly marking the subclass relationship allows type checkers to pinpoint the missing implementations -- or they can be inherited from the ABC, in case that is has default implementations. So I'd currently vote against making these classes protocols.

9) Should we support recursive protocols?

Sure, why not. They might useful for representing self-referential data structures like trees in an abstract fashion, but I don't see them used commonly in production code.

10) Should we support generic protocols?

Generic protocol are important. For example, SupportsAbs, Iterable and Iterator would be generic. We could define them like this, similar to generic ABCs:

T = TypeVar('T', covariant=True)

class Iterable(Protocol[T]):
    def __iter__(self) -> Iterator[T]: ...

11) Should these be interoperable with other similar implementations?

The protocols as described here are basically a small extension to the existing concept of ABCs. I argue that this is the way they should be understood, instead of as something that replaces Zope interfaces, for example.

12) How should isinstance work?

We shouldn't implement any magic isinstance machinery, as performing a runtime compatibility check is generally difficult: we might want to verify argument counts to methods, names of arguments and even argument types, depending the kind of protocol we are talking about, but sometimes we wouldn't care about these, or we'd only care about some of these things.

My preferred semantics would be to make isinstance fail by default for protocol types. This would be in the spirit of duck typing -- protocols basically would be used to model duck typing statically, not explicitly at runtime.

However, it should be possible for protocol types to implement custom isinstance behavior when this would make sense, similar to how Iterable and other ABCs in collections.abc and typing already do it, but this should be specific to these particular classes. We need this fallback option anyway for backward compatibility.

13) Should every class be a protocol by default?

Some languages such as Go make structural subtyping the only or the primary form of subtyping. We could achieve a similar result by making all classes protocols by default (or even always). I argue that this would be a bad idea and classes should need to be explicitly marked as protocols, as shown in my proposal above.

Here's my rationale:

Protocols don't have some properties of regular classes. In particular, isinstance is not well-defined for protocols, whereas it's well-defined (and pretty commonly used) for regular classes.
Protocol classes should generally not have (many) method implementations, as they describe an interface, not an implementation. Most classes have many implementations, making them bad protocol classes.
Experience suggests that most classes aren't practical as protocols anyway, mainly because their interfaces are too large, complex or implementation-oriented (for example, they may include de facto private attributes and methods without a __ prefix). Most actually useful protocols in existing Python code seem to be implicit. The ABCs in typing and collections.abc are a kind-of exception, but even they are pretty recent additions to Python and most programmers do not use them yet.

14) Should protocols be introspectable?

The existing introspection machinery (dir, etc.) could be used with protocols, but typing would not include an implementation of additional introspection or runtime type checking capabilities for protocols.

As all attributes need to be defined in the class body based on this proposal, protocol classes would have better support for introspection than regular classes where attributes can be defined implicitly -- protocol attributes can't be initialized in ways that are not visible to introspection (using setattr, assignment via self, etc.). Still, some things likes types of attributes wouldn't be visible at runtime, so this would necessarily be somewhat limited.

15) How would Protocol be implemented?

We'd need to implement at least the following things:

Define class Protocol (this could be simple, and would be similar to Generic).
Implement metaclass functionality to detect whether a class is a protocol or not. Maybe add a class attribute such as __protocol__ = True if that's the case. Verify that a protocol class only has protocol base classes in the MRO (except for object).
Optionally, override isinstance.
Optionally, translate ... class attribute values to something else (properties?).

I like almost all of this. Let's take this to python-ideas now! I have a few nits and questions, but they're not important enough to wait, and they're not very deep. (There's something niggling about making e.g. Sized a Protocol and not implementing isinstance(), since collections.abc.Sized does implement it.)

FYI, I posted a link to the latest proposal to python-ideas.

Crosslink: https://mail.python.org/pipermail/python-ideas/2015-September/035859.html

Sorry, the description of isinstance was unclear in the proposal. My idea was to preserve the isinstance support for Sized and all the other existing ABCs in typing that would be turned into protocols, but new protocol types wouldn't automatically get any isinstance machinery. This way the ABCs would remain compatible with existing code that uses them and might use isinstance with them.

This looks really cool and seems similar to traits. What if I want to extend library code and tack a protocol onto a class without monkey patching or subclassing? Can we have generic functions/multiple dispatch dispatch on a protocol? That would make for much more modular and slimmer code bases.

That sounds like putting the cart before the horse -- we haven't even got multi-dispatch for regular types or classes working. And multi-dispatch on a protocol sounds pretty expensive to implement (though I suppose you can fix almost anything with caching :-).

But most of all I worry that the code, even though you may find it more modular or slimmer, might actually be harder to comprehend for the majority of Python programmers.

BTW there are many things that go under the name "traits". Which one were you comparing this to?

Scratch the traits comment.

It would be expensive, unless we would we be able to register protocols explicitly. What if I define a method outside of the class definition using an overloaded or multi dispatch function...would the protocol be ok with the needed method not residing as a class attribute?

Drive-by comment: If PEP 526 (variable annotations) is accepted, maybe structural subtyping could be implemented using the following syntax:

class Point(<some magical base class):
    x: float
    y: float
    z: float = 0

Without the magical base class this would just declare a class with instance variables. But with the proper base class it could mean to declare a type to be used with structural type checking, so that, after the above, the following would work:

class MyPoint:  # unrelated to Point
    def __init__(self, x: float, y: float):
        self.x, self.y = x, y
p = MyPoint(1, 2)
assert isinstance(p, Point)  # true, because of structural typing

In the above example, for MyPoint to be a subtype of Point it should probably define z as well, even though Point has a default value for z. The way I see it, the default value would only make a difference if a class explicitly subclasses Point. Or maybe we'd even require MyPoint to give z a default value in the class body, so that MyPoint.z is defined. The latter might be a bit overcomplicated, though.

OK, but that's a problem with my example, not with the "point" I was making.

@gvanrossum Do I understand correctly that you want to say: Specifying types only for methods would be not enough to completely define a protocol, while with PEP 526 it will be possible to specify types for everything in a class and therefore to make a protocol well defined.

I think this is a good point. Moreover, it could be another argument why PEP 526 is important. I think the protocols are very in the spirit of Python duck typing, and everything that helps to introduce them is important.

Yes, that's my point. E.g. Jukka and Elazar recently had a discussion about whether NamedTuple checks should be structural or nominal, and that was entirely about duck-typing attributes, not methods.

Can a protocol be defined entirely though methods as well? (ie making it more abstract)

Yeah, it can be entirely methods, entirely attributes or a mix.

In case anyone cares, the HN post on Zulip's use of mypy led to a thread all about protocols and typing in Python.

Thanks! Interesting thread...

@JukkaL @ambv Here is the first draft of Protocols PEP that I promised to write https://github.com/ilevkivskyi/peps/blob/protocols/pep-0544.txt (this is mostly based on what Jukka wrote above)

I will give you the write rights, so that you could make changes directly there (or you can make PRs when you want). I also have a plan of implementing this in mpy/typing/typeshed we could discuss this later when we agree on the specification.

EDIT: I renamed the file just to run the build, the PEP number is not yet "official".

@ilevkivskyi would this be compatible with function overloading/multiple dispatch? There was some discussion about that and it may make sense to keep things open.

@datnamer I think protocols will be not different from normal classes w.r.t. @overload. There will be full static support, but limited runtime support (via @singledispatch or third-party libs).

Guido, I am aware about your situation, but just in case you have time/ability, I will be very glad if you also review/contribute to this PEP.

Sorry for bringing this up again, but regarding the terminology, I think there's a difference between a "protocol" which implies runtime behaviour and temporal properties (as in "Iterator protocol") , and "structural typing" which is purely syntactic. Mypy (and most typecheckers) do not attempt to check whether a class implements a protocol, but whether it has certain syntactic structure. So why not stick to this term. When you ask whether a class implements the Iterator protocol, all of a sudden you have two possible answers, which may cause some confusion. If we call this thing "structure" or "interface" we say that the protocol requires such and such structure, with such and such runtime behaviour.

class R(Interface): ...

Will be obvious to everyone, including newcomers from some other language.

@ilevkivskyi, this looks good. I reserved the number. I'll be at the Mercurial sprint for the rest of the week but I'll definitely get to this on Monday.

@elazarg,

Will be obvious to everyone, including newcomers from some other language.

This is not the case. Like many other terms in computing science, this one is heavily overloaded. You might understand protocols as stateful runtime behavior, but you can also use this term to define structural interfaces. So, while "interface" might sound friendlier to Java people, "protocol" sounds just as obvious to iOS programmers.

You chose the "Iterator protocol", mentioned in the Python documentation. The same documentation uses "protocol" interchangably with "interface", for instance:

the number protocol
the mapping protocol
the descriptor protocol ...and so on.

@ilevkivskyi Looks good. I'll write some comments next week.

Just finished reading the PEP by @ilevkivskyi ; LGTM and submitted a PR to tweak some grammar (and to let methods with just a docstring be acceptable as declaring a part of the protocol).

I did have some questions, though:

When extending a protocol, does the inheritance order matter? (E.g. https://github.com/ilevkivskyi/peps/blob/2d89ba950431889746810f7a39ac8117b40fff81/pep-0544.txt#L483 has Protocol come first and I wasn't sure if that was on purpose since I'm used to seeing the more generic base class come last)
Would a GenericProtocol class be better than the @runtime decorator? (otherwise I don't think the name communicates what the decorator does well; maybe @add_instancecheck?)

(and to let methods with just a docstring be acceptable as declaring a part of the protocol)

I like this addition, I merged your PR, thanks!

When extending a protocol, does the inheritance order matter?

The order of two protocols makes a difference in case they both define the same method, the one with a more narrow type for the method should come first (this just reflects Liskov principle w.r.t. multiple inheritance, i.e. what you normally see). But this rule does not need to be extended to the Protocol base itself (typecheckers and runtime implementation can easily sort this out). So that these two are equivalent:

class SubSized(Protocol, Sized):
    ...
class SubSized(Sized, Protocol):
    ...

and purely "aesthetically" I like the first version a bit more.

Would a GenericProtocol class be better than the @runtime decorator? (otherwise I don't think the name communicates what the decorator does well; maybe @add_instancecheck?)

This is a good question! I was struggling to choose a good name for this. I still like a decorator a bit more than a different base class. Actually your proposal @add_instancecheck looks good. I was also thinking about something like @auto_instancecheck or @auto_isinstance. The problem here is that either the name is long or it is not very informative.

Wow, a lot to read! I wish this was a PR for the upstream peps repo already, the review tools are a bit better I think.

Re: base class order, I prefer Protocol last, just as with Generic, and in some M.I. cases it will end up last anyway.

In general I like that at runtime this very close to ABCs, but with different behavior for static checking whether a type is compatible with another.

I wish this was a PR for the upstream peps repo already, the review tools are a bit better I think.

OK, I will fix the order of Protocol base (make it last) and will make a PR to upstream soon.

Actually I cannot find a way in GitHub to comment on the file in your repo.

Some random questions:

Can protocols be used in casts? Currently (L298) you enumerate where they can be used -- maybe instead just say they can be used wherever types are used? (Or are there restrictions? Apart from subclassing, which is a separate topic anyways.)
Methods with (nearly) empty bodies seem to have so many variations that I wonder if this convenience isn't too confusing. Why do we need multiple ways anyway?
Given that a valid implementation doesn't have to inherit from the protocol(s) it implements, what's the use case for allowing implementations and non-protocol attributes in protocol classes at all?
Related, it seems confusing that using @abstractmethod makes the implementation serve as a default but leaving it out requires implementation -- this is backwards from the regular meaning of "abstract" methods. (E.g. take your Point(RGB) example -- if RGB was an ABC, Point would be in error for omitting to_byte() but the lack of an intensity() implementation would not be considered an error.)
There is also the issue that sometimes the default implementation really should be just a "pass" statement. But your PEP currently gives that a special meaning. I guess I am really just pushing for either some other explicit marker for methods and attributes that are meant to be part of the protocol, or for making everything except __private names a part of the protocol. Or perhaps insist on ...? But I think we'll end up with an explicit marker. EIBTI.
The rules around defining subprotocols deviate from the rules for ABC -- with the latter, abstract-ness is simply defined by having at least one abstract method being unimplemented.
Recursive protocols deserve a subheading or at least a new paragraph. (L524)
~There ought to be a list of all the ABCs that we want to upgrade to being Protocols. (E.g. Iterable, Sized, etc.)~
L547 "or recursive protocols, structural subtyping is decided positively for all situations that are type safe" sounds very tautological. The example leads me to think that this is just about empty lists?
I know Intersection is a mouthful, but All as the dual of Union feels terminologically murky. Unless you expect this to be very common I hope we can just write Intersection.
Related, I could write an intersection of two protocols using M.I., right?
```
class HashableFloatSequence(Hashable, Sequence[float], Protocol):
pass
```
~Have you thought about implementation yet?~ For mypy, could we start by adding this to mypy_extensions? ~What about eventually backporting to 3.5 and 3.6?~

I'm stopping now, more maybe later.

Re @runtime -- why not make this behavior the default, with the implied [Any] parameter(s) for generic classes?

I suspect the example there is iffy:

def process(items: Iterable[int]):
    if isinstance(items, Iterator):
        # 'items' have type 'Iterator[int]' here

That's not how it currently works in mypy, reveal_type(items) shows Iterator[Any]

PS. typo: issublclass.

Re: ellipsis (L737): I don't think it's necessary to mess with this in the metaclass. Let's just warn people that that is how it is. (The more magic a metaclass has the less people understand the code that depends on that magic.)

Also I think that anything you support in the backport must be supported in the latest version, as people will need to write code that works in a range of Python versions, e.g. 3.5...3.7.

@gvanrossum Thank you for review! It is very helpful. I agree more or less with practically all points. I will implement your comment later (could not do this today).

@gvanrossum

It turns out that more than half of your points could be answered/fixed by just going with explicit marking of protocol members. I think this is a strong sign that we should really do it this way. It is a bit more verbose, but simplifies the logic a lot. I changed the PEP accordingly.

Given that a valid implementation doesn't have to inherit from the protocol(s) it implements, what's the use case for allowing implementations and non-protocol attributes in protocol classes at all?

We need to allow explicit implementation (subclassing) of protocols for two reasons. First, existing code should continue working. Second, protocols could provide useful "building blocks" that may be used to implement the abstract methods.

L547 "or recursive protocols, structural subtyping is decided positively for all situations that are type safe" sounds very tautological. The example leads me to think that this is just about empty lists?

Yes, this is something similar. This is a bit tricky point, following normal rules (protocol is implemented if all signatures match) here we find that Tree[float] is a subtype of Traversable if Tree[float] is a subtype of Traversable (because of signature of leafs). I wanted to say that such (seemingly ambiguous) cases are decided positively. I tried to improve wording.

I know Intersection is a mouthful, but All as the dual of Union feels terminologically murky. Unless you expect this to be very common I hope we can just write Intersection.

I dont't think it will be very common, but still quite common. We could hear from others, and then choose the name.

That's not how it currently works in mypy, reveal_type(items) shows Iterator[Any]

Yes, but I think mypy could find the more precise type Iterator[int], for protocols this would be relatively easy to improve.

@gvanrossum OK, I pushed the changes and made a PR to python/peps https://github.com/python/peps/pull/224 so that it will be convenient to review.

@gvanrossum

For mypy, could we start by adding this to mypy_extensions?

Why do you think it is better to add this to mypy_extensions instead of adding this directly to typing? I think think this will be a widely used feature. Also it is quite uncontroversial, we still need to agree on details, but the general idea of structural subtyping seems to be clear.

My current plan for implementation is (not necessarily in this order):

Add runtime implementation to typing
Update typeshed
Add protocol type to types.py. Here I think we could potentially reuse TypedDict and make protocols just Instances with a special field protocol_type that will point to a TypedDict mapping member names to member types.
Recognize protocol definitions in semanal.py
Implement actual changes to meet.py, join.py, subtypes.py, etc.
Add support for recursive protocols.

I think it is better to make this in several steps (PRs), rather than in single one. What do you think?

python / typing

Protocols (a.k.a. structural subtyping) #11