Allow subclassing without supertyping

dmoisset commented 8 years ago

This issue comes from python/mypy#1237. I'll try to make a summary of the discussion here

Some one reported an issue with an artificial example (https://github.com/python/mypy/issues/1237#issue-136162181):

class Foo:
    def factory(self) -> str:
        return 'Hello'

class Bar(Foo):
    def factory(self) -> int:
        return 10

and then with the following stub files (https://github.com/python/mypy/issues/1237#issuecomment-188710925):

class QPixmap(QPaintDevice):
    def swap(self, other: 'QPixmap') -> None: ...

class QBitmap(QPixmap):
    def swap(self, other: 'QBitmap') -> None: ...

Which mypy currently reports as erroneous: Argument 1 of "swap" incompatible with supertype "QPixmap"

These were initially was argued to be a correct error because they violate Liskov Substitution Principle (the same case by a change of return type, the second is a covariant argument change). The problem in this scenario is that the actual classes are not meant to be substituted (the first example is actually a wrapper for a non-virtual C++ function so they're not substitutable, and the second aren't supposed to be mixed together, just reuse part of the code). (see https://github.com/python/mypy/issues/1237#issuecomment-189490903)

There was also a suggestion that allow covariant args explicitly (in the same way that Eiffel allows it and adds a runtime check) with a decorator

class QBitmap(QPixmap):
    @covariant_args
    def swap(self, other: 'QBitmap') -> None: ...

My proposal instead was add what some dialects of Eiffel call "non-conforming inheritance" ("conforms" is the Eiffel word for what PEP-483 calls "is-consistent-with"). It is essentially a mechanism to use subclassing just as a way to get the implementation from the superclass without creating a subtyping relation between them. See https://github.com/python/mypy/issues/1237#issuecomment-231199419 for a detailed explanation.

The proposal is to haven Implementation in typing so you can write:

class QBitmap(Implementation[QPixmap]):
    def swap(self, other: 'QBitmap') -> None: ...

which just defines a QBitmap class with all the mothods copied from QPixmap, but without making one a subtype of the other. In runtime we can just make Implementation[QPixmap] == QPixmap

dmoisset commented 8 years ago

As a follow-up to the discussion, @gvanrossum suggested using a class decorator. The problem with this approach is that class-level is not the right granularity to decide this, but each individual inheritance relation is. Ina multiple-inheritance context, you could want to inherit the interface of a SuperClassA, and the implementation of SuperClassB. I think that might be quite common with mixins (actually I think most mixins could be used with this form of inheritance)

Looking deeper into what Eiffel does (but this is a bit of a side-comment, not part of the main proposal), there it was also possible to define a superclass in a way that only implementation inheritance from it was allowed, and that could work well with mixins (and implemented as a class decorator). However you still need to be able to say "I just want the implementation of this superclass" when inheriting, because the writer of the superclass may not have foreseen that you wanted that.

dmoisset commented 8 years ago

Also note that this idea makes self types more viable. Things like TotalOrderingMixin where you have

    def __lt__(self: T, other:T): -> bool ...

can be defined more accurately (currently this is in the stdlib), and with no errors (note that the < operator/ __lt__ method is actually covariant in its argument in Python 3). TotalOrderingMixin is something that would create an error in every case with the current mypy implementation if self types are added as is.

ilevkivskyi commented 8 years ago

I think this is a very nice feature. In general, it is good that we separate classes (code reuse) and types (debugging/documentation). If it is added, then I would also mention this feature in PEP483 (theory) as an example of differentiating classes and types.

Between a decorator and a special class I chose the second one, since as @dmoisset mentioned, it is more flexible and suitable fox mixin use cases. Also, it is very in the spirit of gradual typing, one can mark particular bases as Implementation while still typecheck with respect to other bases.

gvanrossum commented 8 years ago

I'd like to explore this further. I expect that specifying semantics that work for every use case will be tricky. E.g. when using implementation inheritance, how much control does the subclass have and how much control does the superclass have? Do you have to repeat the signature for every method? Only describe exceptions? Which use cases should be care about most?

I hope that one of you can come up with a concrete, detailed proposal.

ilevkivskyi commented 8 years ago

I think it is better to have a minimalistic solution, rather than try to make a complex proposal that will cover all use cases. Later, more features for most popular use cases could be added.

Here is a detailed proposal with examples:

1) At runtime Implementation[MyClass] is MyClass. So that the separation of control between subclass/superclass is as usual. Also, all normal mechanisms work at runtime:

class Base:
    def __init__(self, number: int) -> None:
        self.number = number

class AnotherBase:
    ...

class Derived(Implementation[Base], AnotherBase):
    def __init__(self) -> None:
        super().__init__(42) # This works since Implementation[Base] is just Base

2) Use cases for Implementation are situations where a user wants to reuse some code in another class without creating excessive "type contracts". Of course one can use Any or #type: ignore, but Implementation will allow more precise type checking.

3) The derived class is not considered a subtype of implementation base classes. An explicit cast might be used under user's responsibility. Examples:

from typing import cast

def func(x: Base) -> None:
    ...
def another_func(x: AnotherBase) -> None:
    ...

func(Base(42))              # this is OK
another_func(Derived())     # this is also OK
func(Derived())             # Marked as error by a typechecker
func(cast(Base, Derived())) # Passes typecheck

4) Methods of an implementation base class could be redefined in an incompatible way:

class Base:
    def method(self, x: int) -> None:
        ...

class AnotherBase:
    def another_method(self, y: int) -> None:
        ...

class Derived(Implementation[Base], AnotherBase):
    def another_method(self, y: str) -> None: ... # This is prohibited by a typechecker
    def method(self, x: str) -> None: ...         # Although, this is OK

5) As @JukkaL proposed, a typechecker checks that the redefined methods do not "spoil" the already checked code by re-type-checking the bodies of all base class methods as if they were written in the subclass (after translating type variables suitably if the base class is generic).

I think the re-check should be recursive, i.e. include bases of bases, etc. Of course, this could slow down the type-check, but not the runtime.

6) A typechecker uses the normal MRO to find the signature of a method. For example, in this code:

class Base:
    def method(self, x: int) -> None: ...

class AnotherBase:
    def another_method(self, y: int) -> None: ...

class Derived(Implementation[Base], AnotherBase):
    def method(self, x: str) -> None: ...

a typechecker infers that Derived().method has signature (str) -> None, and Derived().another_method has signature (int) -> None.

7) Clearly, Implementation could be used only with a proper class. Implementation[Union[t1, t2]], Implementation[Callable[[t1, t2, ...], tr]], etc. rise TypeError.

If we agree on this, then I will submit PRs for typing.py, PEP484, and PEP483.

JukkaL commented 8 years ago

This has the problem that a type checker doesn't see inside the implementations of stubbed classes, and you could break some of the base class methods without a type checker having any way of detecting it.

Maybe implementation inheritance would only be allowed from within your own code somehow. For example, behavior would be undefined if you inherit the implementation of a stdlib or 3rd party library class, unless you include the implementation of the library module in the set of checked files. object would be fine as a base class, of course, but you probably shouldn't override any methods defined in object in an incompatible manner, since this can break a lot of things. This would only be enforced by static checkers, not at runtime.

Not sure if the latter would be too restrictive. To generalize a bit, it would okay if the implementation base class has library classes in the MRO, but methods defined in those classes can't be overridden arbitrarily.

ilevkivskyi commented 8 years ago

@JukkaL This is a good point. We need to prohibit incompatibly overriding methods everywhere except for accessible in the set of checked files. I don't think it is too restrictive. Alternatively, we could allow this with a disclaimer that complete typecheking is not guaranteed in this case.

gvanrossum commented 8 years ago

Yeah, I'm afraid the use case from https://github.com/python/mypy/issues/1237 is not so easily addressed by things that preserve soundness. But I don't have an alternate proposal. :-(

ilevkivskyi commented 8 years ago

@gvanrossum Indeed, there is not much could be done. But I don't think this is bad. As with Any or # type: ignore, people who are going to use this feature know what they do. More important is that Implementation provides more precise typechecking, more "granularity", and probably a bit more safety than Any.

JukkaL commented 8 years ago

The QPixmap example shouldn't be a problem, since it seems to be for a library stub file. We don't verify that stubs are correct anyway, so a stub would be able to use implementation inheritance without restrictions but there are no guarantees that things are safe -- we expect the writer of the stub to reason about whether implementation inheritance is a reasonable thing to use.

For user code, I think that we can provide a reasonable level of safety if we enforce the restrictions that I mentioned.

JukkaL commented 8 years ago

I have a related proposal. What about having a magic class decorator (with no runtime or minimal effect) or a base class that declares a class as a "pure mixin"? A pure mixin could only be used for implementation inheritance, and it wouldn't be type checked by itself at all -- it would only be type checked as part of implementation inheritance. I think that this would allow type checkers to check some mixin use cases that are tricky right now with minimal code changes. The alternative would be to use ABCs to declare the things that the mixin expects to be defined elsewhere, but they are harder to use and arguably add boilerplate.

Example:

@puremixin
class MyMixin:
    def do_stuff(self) -> int:
        self.one_thing()  # ok, even though not defined in this class
        return self.second_thing(1)  # also ok

class Thing(Implementation[MyMixin]):
    def one_thing(self) -> None: pass  # ok
    def second_thing(self, x: int) -> str:    # error: return type not compatible with do_stuff
        return 'x'

gvanrossum commented 8 years ago

But is that the way mixins are typically used? It would be good to look carefully at some examples of mixins (e.g. one of our internal code bases uses them frequently).

ilevkivskyi commented 8 years ago

@JukkaL How about the following disclaimer to be added?

""" Note that implementation inheritance could be used with stubbed extension modules, but there are no guarantees of safety in this case (type checkers could not verify correctness of extension modules). The writer of the stub is expected to reason about whether implementation inheritance is appropriate.

In user code, implementation inheritance is only allowed with classes defined in the set of checked files. Built-in classes could be used as implementation base classes, but their methods could not be redefined in an incompatible way. """

I like you proposal of @puremixin decorator (maybe we could choose other name). It looks like it is independent of the initial proposal and could be easily added. I propose @gvanrossum to decide whether we need this.

JukkaL commented 8 years ago

@gvanrossum I've seen mixins like that (that use things that aren't defined anywhere in the mixin class). I'm not sure how common they are.

@ilevkivskyi Looks good. I'd do some small changes (feel free to edit at will):

Note that implementation inheritance could be used with stubbed extension modules, but there are no guarantees of safety in this case (type checkers could not verify correctness of extension modules). The writer of the stub is expected to reason about whether implementation inheritance is appropriate.

->

Note that implementation inheritance could be used with stubbed extension modules, but there are no guarantees of safety in this case (type checkers could not verify correctness of extension modules). As a special case, if a stub uses implementation inheritance within the stub, the writer of the stub is expected to reason about whether implementation inheritance is appropriate. Stubs don't contain implementations, so type checker can't verify the correctness of implementation inheritance in stubs.

Built-in classes could be used ...

->

Built-in and library classes could be used ...

JukkaL commented 8 years ago

Also, I'm not at all attached to the puremixin name.

gvanrossum commented 8 years ago

Maybe the puremixin proposal deserves a new issue?

JukkaL commented 8 years ago

Created #246 for pure mixins.

gvanrossum commented 8 years ago

@gvanrossum I've seen mixins like that (that use things that aren't defined anywhere in the mixin class). I'm not sure how common they are.

We're probably talking about the same mixin classes. :-) However, I'm not convinced that those would deserve being marked as Implementation[] -- the class that inherits from the mixin typically also inherits from some other class and the mixin should not conflict with any methods defined there (though it may override them). (OTOH maybe I didn't read the examples all the way through.)

JukkaL commented 8 years ago

Let's move the discussion of mixins to #246?

dmoisset commented 8 years ago

I was writing a proposal and it may be a bit messy to post it now because I duplicated some stuff already said, but wrote some other things differently. Let me summarize different or subtle points that I think should be discussed:

Using a generic-like syntax like Implementation[Foo] looks to similar to generics while what we have is somewhat different. I've been thinking on a functional syntax, i.e.:

class Base: ...
class Derived(Foo, implementation_only(Base), Bar): ...

so it doesn't make sense to pass it a Typevar or anything that has no supporting class (like unions or protocols). I think the different syntax helps.

other names I considered for implementation_only() are reuse() or include() or insert() (the latter ones makes sense if you think of the mechanism it implements). "implementation" sounds a bit vague (even if it was my first proposal)
probably the type checker should verify that implementation_only is used only in the context of a class definition in the base class list.
the typechecker should ignore the Base methods if overriden in Derived, or in other parent (Foo, but not Bar in the example above)
inherited bodes should be checked if available; if stubs only no checking its done (it's consistent with the rest of mypy that ignore bodies if stubs are present)
super() needs some special care. The simplest way to approach it is that a super() object does not have methods that have a signature coming from an implementation_only parent, otherwise you could get mistyped calls from the mro chain. If you're doing implementation only inheritance, following the mro makes no sense anyway. We should allow some way to get the parent methods, maybe through an explicit class reference (allowing some leeway on the self parameter when some method on Derived says Base.method(self, ...))
A class defined having all its parents as implementation_only, is interpreted by the typechecker as a direct descendent of object.

As a side note, using functional syntax would make it trivial to use @implementation_only as a class decorator too, which can be used for the mixin mechanism proposed by @JukkaL . However for that example I'd prefer the mixin to actually have the signatures of its "abstract methods"; if I want to implement a mixin one of the things I want from a static type system id to give me some hints of what these methods should take. For me, the decorator would only make subclasses of the mixin to behave like the rest of this proposal

dmoisset commented 8 years ago

Btw, for further reference there's a paper on the idea at http://smarteiffel.loria.fr/papers/insert2005.pdf

dmoisset commented 8 years ago

There are some things that can be added to this mechanism (I left these apart because they aren't part of the core proposal to keep the above KISS).

Allow classes to be always implementation_only (like @JukkaL said):

@implementation_only
class SomeMixin:
    def method(self, x: int) -> str: ...
    def other_method(self, x: int) -> str: ...

class C(SomeMixin):
    def method(self) -> None: ... # OK to change the signature

Allow to selectively include or exclude parts of the parent class:

class C(
        implementation_only(
            SomeMixin, include=('method', 'attribute')
        )
       ): ...

class D(
        implementation_only(
            SomeMixin, exclude=('other_method')
        )
       ): ...

Allow renames of methods


class E(
        implementation_only(
            SomeMixin, renames={'method': '_d_method'}
        )
       ): ...

The exclusion/renaming mechanism are present in Eiffel, and the renaming allows to unambiguously resolve issues related to super() by given explicit different names to the overriden methods, although the mechanism has some complexities that might be problematic in python.

I don't think the first proposal should include any of these, but I wanted to mention these to also clarify that the functional syntax gives more flexibility for future growth of the idea.

ilevkivskyi commented 8 years ago

@dmoisset Indeed, some things you propose have already been said. This is rather a good sign. Some comments:

I don't think that we need functional syntax, square brackets are used not only for generics they are used almost everywhere in the type checking context.
I like the name Implementation. It reminds me that I am inheriting only the "implementation" of the class, not the "interface" (i.e. type contracts).
I am not sure that the restriction about super() is really necessary.

dmoisset commented 8 years ago

The possible problem with super happens in the following snippet:

def implementation_only(C): return C

class Base:
    def f(self, a: int) -> None:
        print("Base")
        print(a)

class Left(Base):
    def f(self, a: int) -> None:
        print("Left.f")
        super().f(a)

class Right(implementation_only(Base)):
    def f(self, a: str) -> None:
        print("Right.f")
        super().f(len(a))

class C(Left, implementation_only(Right)):
    def f(self, a: int) -> None:
        print("C.f")
        super().f(a)

C().f(42)

This code fails with a TypeError if run with Python. but the rules defined above don't detect a problem:

Base and Left are correctly typed
Right redefines the signature of f (which is ok, given that it's using implementation inheritance).
Right.f calls super.f using an int argument (which looks ok, given that the inherited f accepts an int)
C redefines the signature of Right.f (which is ok, given that it is using implementation inheritance). It keeps the signature of Left.f so no problem there
C.f calls super.f with an int. The type checker is only looking at the Left.f signature, so this is considered valid.

But even if everything looks ok by itself, the program has a type error, so the rules are unsound. So some of the rules above should actually be rejected in this scenario. 1 seems to be an ok rule (valid with the current semantics), and 2 is our proposal so let's assume it as valid (if we get to a contradiction the idea itself is unsound). So 3, 4, or 5 should be wrong, and we should choose (at least) one, under some conditions (which should include this scenario). My proposal was "3 is wrong. You can't call super from an implementation_only redefinition of a method". This actually will make the call that produces the TypeError in runtime to be invalid.

Other options are making this scenario impossible forbidding the redefinition in some multiple inheritance cases (4 would be invalid), but it's probably harder to explain to users

ilevkivskyi commented 8 years ago

Thank you for elaborating, I understand your point. However, it looks like a more sophisticated typechecker could catch the TypeError. By re-type-checking bodies of methods in C parent classes, a typechecker could detect that the super().f(a) call in Left has incompatible signature, since super() for Left is now Right, and f in Right requires a str.

Prohibiting 3 is an absolutely valid option (although it feels a bit false positive to me), also prohibiting 4 is a valid option. What I propose is to leave this choice (sophisticated recursive re-checking vs prohibitng 3 vs prohibiting 4) up to the implementer of a particular typechecker.

Restrictions mentioned by Jukka are necessary, because it is extremely difficult (probably impossible in principle) to check types inside extension modules, so that we: a) "forgive" typecheckers all uncaught TypeErrors related to this; b) propose some restrictions for user code that nevertheless allow to maintain safety. At the same time, typecheckers should be able to catch the error like you show in either way they choose (i.e. if a typechecker is not able, then this could be considered a bug in that checker).

gvanrossum commented 8 years ago

I have to say that I am uncomfortable with solutions (to any problem, not just this one) that say "just re-type-check such-and-such code with this new information". This could potentially slow things down tremendously (depending on how often new information is discovered and acted upon).

It also makes things hard to explain to users -- e.g. how to explain that their code that was previously marked as correct is now deemed incorrect, due to something that someone else added somewhere else. Why not place the blame on the new subclass that apparently violates the assumptions of the base class, for example?

I also worry that such solutions violate some fundamental nature of type annotations. When mypy sees "def foo(arg: int) -> int: ..." it type-checks the body of foo() and then it can forget about it -- everything it needs to know about foo() is now summarized in the function signature. Likewise for classes. And for modules. A solution that requires mypy to go back essentially introduces a circular dependency between (in this case) base class and superclass, or maybe (in other examples) caller and callee, or importer and importee.

Those circular dependencies worry me especially in the context of the work we're doing on incremental checking in mypy -- it makes the assumption that if a module hasn't changed, and its dependencies (taken transitively) haven't changed either, then the module doesn't need to be re-checked on a future run, and the information collected in its symbol table (about global variables, functions, classes, methods and signatures) is sufficient to analyze any new code that happens to import it. I'd like not to see this assumption violated.

Of course it's common that circular dependencies occur at all levels -- import cycles, mutually recursive functions, what have you. And mypy deals with them (and we're improving it for those edge cases where it fails to deal with them). But I think I want to avoid introducing extra cycles in relationships that users see as one-directional.

JukkaL commented 8 years ago

The point about incremental checking is a good one. General performance could also be an important concern if people would start using this more widely. My expectation would be that this would mostly be used in fairly simple cases, but of course any feature given to users tends to be used in unforeseen ways, and once the genie is out of the bottle it's difficult to put it back there. There could be also other interactions with particular language features that we haven't though of yet.

Due to all the concerns and tricky interactions that we've uncovered, I'd like to see a prototype implementation of this before deciding whether to add this to the PEP -- I have a feeling that we don't yet understand this well enough.

ilevkivskyi commented 8 years ago

After thinking more it looks like there are three important "use cases" for this feature:

The original request originates from typing extension modules;
Another use case is inheriting from library classes, where a user has some already working code, that redefines methods in a "theoretically" incompatible way;
A use case proposed by @JukkaL: pure mixin as a class that is not typechecked at all when defined, but only when inheriting from it.

What I see as important now is that first two use cases actually do not require recheck of base classes (and it is even impossible for built-in types, extension modules and library stubs), only the third one requires it. Let us focus on first two cases here and discuss re-typechecking and related issues in #246.

A reworked proposal that covers two first cases is: Implementation is just a way to say "I know what I do" that is "softer" than Any. More details:

1) At runtime Implementation[MyClass] is MyClass. All normal mechanisms work at runtime:

class Base:
    def __init__(self, number: int) -> None:
        self.number = number

class AnotherBase:
    ...

class Derived(Implementation[Base], AnotherBase):
    def __init__(self) -> None:
        super().__init__(42) # This works since Implementation[Base] is just Base

2) The derived class is not considered a subtype of implementation base classes. An explicit cast might be used under user's responsibility. Examples:

from typing import cast

def func(x: Base) -> None:
    ...
def another_func(x: AnotherBase) -> None:
    ...

func(Base(42))              # this is OK
another_func(Derived())     # this is also OK
func(Derived())             # Marked as error by a typechecker
func(cast(Base, Derived())) # Passes typecheck

3) Methods of an implementation base class could be redefined in an incompatible way:

class Base:
    def method(self, x: int) -> None:
        ...

class AnotherBase:
    def another_method(self, y: int) -> None:
        ...

class Derived(Implementation[Base], AnotherBase):
    def another_method(self, y: str) -> None: ... # This is prohibited by a typechecker
    def method(self, x: str) -> None: ...         # Although, this is OK

No re-typecheck of base classes is performed, it is the responsibility of a user to ensure that the new signature is safe.

4) A typechecker uses the normal MRO to find the signatures of methods and uses them to typecheck their use in derived class and elsewhere as usual. For example:

class Base:
    def method(self, x: int) -> None: ...

class AnotherBase:
    def another_method(self, y: int) -> None: ...

class Derived(Implementation[Base], AnotherBase):
    def method(self, x: str) -> None: ...

Derived().method('abc')         # OK
Derived().another_method('abc') # Fails, incorrect type of argument
Derived().third_method('abc')   # Fails, no such method

5) Implementation could be used only with a proper class. Implementation[Union[t1, t2]], Implementation[Callable[[t1, t2, ...], tr]], etc. raise TypeError.

6) People who will use this feature probably want some speed, therefore implementation of Implementation should be fast. Probably, it should not be a class itself, just an object whose class defines __getitem__ that returns its argument.

@gvanrossum @dmoisset what do you think about this?

gvanrossum commented 8 years ago

It still sounds like too big a hammer. In https://github.com/python/mypy/issues/1237#issuecomment-188485241 @philthompson10 wrote:

Is there a case for treating stub files differently, maybe by providing something like @overload that tells mypy that Bar.factory() is not an override of Foo.factory()?

From this it seems pretty clear that he doesn't want to completely sever the ties between Foo and Bar, only between the two unrelated factory() methods. (Though perhaps this is a job for # type: overload?)

dmoisset commented 8 years ago

@gvanrossum I think it was more or less established that the initial example in #1237 (the Foo.factory return type definition) was unsound, and that case can already be solved with a #type: ignore at the redefinition point, without affecting the relationship between classes.

The other scenario presented later (The one about QBitmap) may be more interesting (I'm assuming those classes can be used differently.

To move away from examples that are not my own, I also found a similar one at https://github.com/django/django/blob/stable/1.10.x/django/utils/datastructures.py#L48 ; there you have a MultiValueDict class that inherits dict because the implementation of dict is good for it, but it has a slightly different API (actually, it has the API of a Dict[K, V] extended, but the implementation of a Dict[K, List[V]]). It's ok because it's not intended to replace a dict, it's just a new data structure to be used separately.

In this situation, I think a proposal like the last one from @ilevkivskyi 's would work. There is some potential for unsoundness (essentially every reference to "self" in the inherited class can be subject to a broken call), but the alternatives here are using Any or # type: ignore which can also lead to runtime errors. The point here is to be able to explicitly say "this inherits from dict, but it's not something you can use in place of a dict", and have that checked (and allow me to break the dict contract in a class that is not really a dict).

ilevkivskyi commented 8 years ago

@gvanrossum There are two questionable things in my proposal:

per-class vs per-method specification of implementation inheritance
whether to assume derived class a subtype or not

While I am not 100% sure that it should be per class, still I think that it is less verbose, since in such cases people would want to override more than one method, like in django example by @dmoisset .

Concerning the second question, I think it is better to not assume the derived class a subtype of base. First, subtype assumption could lead to many uncaught errors. Second, if someone needs to allow also subclass, then a Union could be easily use to type such situations:

class Base: ...
class Derived(Implementation[Base]): ...

def fun_1(x: Base): # this function uses methods that are overridden in Derived
    ...

def fun_2(x: Derived): # this one too
    ...

def fun_3(x: Union[Base, Derived]): # incompatibly overridden methods are not used here
    ...

If we want to keep the subtype relationship, then we need to do the re-typecheck of bases, which is quite complex and time consuming as we already agreed. I think this should be left to #246 where this is important. @JukkaL what do you think?

On the contrast, here I propose a simple scheme that covers extension modules, built-ins, library stubs, etc. Here I agree with @dmoisset

The point here is to be able to explicitly say "this inherits from dict, but it's not something you can use in place of a dict", and have that checked

gvanrossum commented 8 years ago

Hm... The QBitmap example only has one method, so it's still not really very enlightening. The MultiValueDict example seems more helpful, except it seems that it overrides all the methods -- so why not the underlying dict an instance attribute? That might even be quicker than all those super() calls.

I also still have some questions regarding the specification of Implementation[Base]. Suppose:

class Base:
    def foo(self, a: int) -> int: ...
class Derived(Implementation[Base]):
    def bar(self) -> int:
        return self.foo(12)

Is the call to self.foo(12) inside bar() valid? What about Derived().foo(12)? Python itself certainly has no problem with those calls. But type-checking them might be tricky if we say that Derived is not a subtype of Base -- by what other mechanism would a checker consider those calls valid?

A similar case would be if Derived did override foo(), and its foo() called super().foo(). Should that work?

I suppose the answer would be that those calls are valid, and that the checker should still consider Base a member of the MRO of Derived.

But that answer still leaves me unhappy for two reasons. First, the checking of Derived itself feels to lax: anything that happens to override a method of Base gets a free pass, whether intended or not. I would rather see a proposal that requires that conflicting method overrides are explicitly flagged (with something more specific than # type: ignore).

Second, despite Daniel's assurance that it's just a convenient implementation device, I'm pretty sure that the users of MultiValueDict would be unhappy if a MultiValueDict couldn't be passed into code expecting a dict (or perhaps a Mapping or MutableMapping), since it does in fact implement those interfaces (AFAICT). Also the name would suggest so. Maybe that could be solved by making it inherit from MutableMapping also, e.g.

K = TypeVar('K')
V = TypeVar('V')
class MultiValueDict(Implementation[Dict[K, List[V]]], MutableMapping[K, V]):
    ...

Does all if this work as expected?

ilevkivskyi commented 8 years ago

@gvanrossum

anything that happens to override a method of Base gets a free pass, whether intended or not. I would rather see a proposal that requires that conflicting method overrides are explicitly flagged

OK, I would propose a dummy decorator here, maybe named @override that allows to override incompatibly a method of an Implementation base:

class BaseObj:
    def foo(self) -> str: ...
    def bar(self) -> object: ...

class BaseInt:
    def bar(self) -> int: ...

class Derived(Implementation[BaseInt], BaseObj):
    @override
    def foo(self) -> int: ... # error, @override only allowed with Implementation
    @override
    def bar(self, x: int) -> object:
    # also error, new signature must be compatible with non-Implementation bases
    ...
    @override
    def bar(self) -> str: ... # this one is OK

I'm pretty sure that the users of MultiValueDict would be unhappy if a MultiValueDict couldn't be passed into code expecting a dict (or perhaps a Mapping or MutableMapping), since it does in fact implement those interfaces

I think it is better not to modify anything at runtime (your proposal might change MRO in some cases). I propose to list supertypes that one wants to keep as a keyword argument (that will accept a single class or a tuple of classes):

K = TypeVar('K')
V = TypeVar('V')
class MultiValueDict(Implementation(Dict[K, List[V]], keep=MutableMapping)):
    ...

I don't think that we need to repeat the type variables in supertype, they can be inferred automatically.

If we want to use the keyword, then we are forced to use (...) instead of [...] for Implementation. I don't know whether it is good or bad.

Alternatively, we can use syntax like this:

K = TypeVar('K')
V = TypeVar('V')
class MultiValueDict(Implementation[dict, MutableMapping[K, List[V]]]):
    ...

and say that Implementation takes runtime info from the first argument, and static type info from the optional second argument (that could be a class, or a list of classes). @gvanrossum what do you think?

gvanrossum commented 8 years ago

Honestly I think the proposal is fraught with complexities and I think it's time to put it to rest or at least wait for a long time until we find more evidence of solid real-world use cases.

dmoisset commented 8 years ago

@gvanrossum Regarding "Is the call to self.foo(12) inside bar() valid?" for me its obviously yes, given that the purpose is to actually get the methods in the subclass (if you don't get the methods and don't get the subtypes, the classes might as well be unrelated).

Also your statement of «I'm pretty sure that the users of MultiValueDict would be unhappy if a MultiValueDict couldn't be passed into code expecting a dict» looks perfectly valid, but to make it 100% accurate it should be expressed with the generic params: «I'm pretty sure that the users of MultiValueDict[str, Foo] would be unhappy if a MultiValueDict[str, Foo] couldn't be passed into code expecting a Dict[str, List[foo]]» and there's where the problem makes itself apparent (because that is something that this implementation doesn't allow and it would be nice if the checker detected it).

This is a somewhat contrived case because there is both interface and implementation inherited but from two different instantiations. However I don't think this is not a rare case; on a quick inspection I found many cases of this situation around dict on the python stdlib: (http.cookies.BaseCookie, http.cookies.Morsel, logging.config.ConvertingDict are dicts of one type by implementation, but its interface shows them as dicts of another); another situation is collections.Counter that redefines update() with an incompatible signature. also some cases around bulitin types: (xml.dom.minicompat which redefines tuple.add incompatibly).

On custom classes I didn't find many examples in public code that I know well; an additional one from the stdlib is http.client.HTTPMessage which inherits email.message.EmailMessage to reuse header parsing functionality (this is perhaps the most obvious and simple case of "inheritance of implementation but not subtyping").

(just read newest message on the issue while I was writing this comment) I understand that this might not be a priority, I might update this issue with other examples I find in the future in case it helps a future discussion.

For me the original (simpler) version was good enough. It is not a free pass in calls in Derived (they are perfectly checked), it has some weakness on inherited-but-not-overriden references to self (including uses of super), but the added weakness is better than a type:ignore, it prevents errors respecting developer intention (using a value as subtype when not intended/valid as such). I wouldn't add the complexities of override decorator, options to keep stuff, etc, in the sense that they do not prevent any problems that don't already exist with the current approach (i.e "type: ignore").

gvanrossum commented 8 years ago

OK, I think we actually agree on a kernel of behavior for Implementation[Base], and maybe we can agree that further complexities are just a distraction. But honestly I would rather hear from people who find they need this for their own code that they are annotating -- I know that if I had a class like MultiValueDict I would just rewrite it so that it doesn't inherit from dict at all.

PetterS commented 8 years ago

I need it for the following scenario. Consider an optimization solver written in Python:

optimization_problem = Problem()
x = optimization_problem.add_variable()
y = optimization_problem.add_variable()
optimization_problem.add_constraint(x + y == 2)

This requires that the type of x defines a custom eq returning something other that bool. This causes an error in mypy:

error: Return type of "__eq__" incompatible with supertype "object"

I think this use case is legitimate. There is a linear programming library doing this.

gvanrossum commented 8 years ago

I see the point, but what benefit do you get from type-checking such code? Wouldn't it be rowing upstream because everything is overloaded in a funny way?

In the meantime, if you put a #type:ignore on your definitions of eq, does the rest work?

PetterS commented 8 years ago

I guess the benefit would be that code like

optimization_problem.add_constraint("abc")

etc. would be marked as an error. I haven’t actually implemented this completely, so I don’t know whether it would be rowing upstream. I am still very new to this (installed 3.5 + mypy today).

Yeah, # type: ignore and @no_type_check both silences the error. I assume the type annotations are still used elsewhere.

gvanrossum commented 8 years ago

Using "# type: ignore" is more precise. I don't know for sure what that decorator does in mypy.

--Guido (mobile)

ilevkivskyi commented 8 years ago

@gvanrossum

But honestly I would rather hear from people who find they need this for their own code that they are annotating

The situation described by @PetterS is quite widespread in the scientific community. I did such thing several times. If one adds #type: ignore to

class Expr:
    def __eq__(self, other: 'Expr') -> 'Expr':
        ...

then this method will be typed as Callable[[Any], Any]. That will lead to two kinds of errors: 1) If Expr is used where object is expected

def eq_one(x: object) -> bool:
    return x == 1
eq_one(Expr()) # error here

2) If something other than Expr is passed to its __eq__ method:

Expr() == 'boom!' # error here

Both kinds of errors are not caught by mypy now, but will be caught with Implementation:

class Expr(Implementation[object]):
    def __eq__(self, other: 'Expr') -> 'Expr':
        ...

Of course I could not judge how widespread are such situations in general. Maybe we could ask others? @vlasovskikh Have you heard about such situations from users of PyCharm?

pkch commented 7 years ago

I wanted to add class constructors to the list of issues to clear up in this discussion. At the moment, they receive a confusing treatment:

# example 1
class A:
    def f(self, x: Sequence) -> None: ...

class B(A):
    # error: Argument 1 of "f" incompatible with supertype "A"
    def f(self, x: List) ->  None: ...

but

# example 2
class A:
    def __init__(self, x: Sequence) -> None: ...

class B(A):
    # passes type check
    def __init__(self, x: List) ->  None: ...

OTOH, class creation passes type check:

# example 3
class A: 
    def __init__(self, a: int) -> None:
        self.a = a
def f(a: Type[A]) -> A:
    return a(0)

The only difference between constructors and other methods is that they affect Liskov substitutability of different objects: constructors affect the classes, while other methods affect the instances of those classes.

My preference would be to do what this thread is proposing - let the programmer easily choose whether Liskov constraint (=subtyping) should apply, regardless of subclassing; so __init__ could be either constrained or not. If it's constrained, then example (2) should fail type check but (3) should pass; if it's not constrained, it (2) should pass but (3) should fail.

BTW, another argument in favor of separation of subytping from subclassing is that substitutability only makes sense if the semantics is compatible -- it's not enough that the methods have matching types; for example:

it's rarely ok pass iterators in place of iterables (despite the subclass relationship) because they might be exhausted (leading to incorrect behavior)
it's dangerous to pass a x: defaultdict instead of a x: dict because there might be try dostuff(x[k]) except KeyError ... in the code

pkch commented 7 years ago

Let's say Y inherits from X (through regular class Y(X) inheritance, not through ABCMeta.register).

Currently:

Automatically Y is subtype of X and type safety is ensured (i.e., subclass methods compatibility is checked).
Automatically Type[Y] is subtype of Type[X] but type safety is not ensured (i.e., subclass __init__ compatibility is not checked).

The proposal so far is:

Programmer can choose whether or not Y should be a subtype of X. If he chooses to create this subtyping relationship, then type safety is ensured (same as at present).

I think for consistency, the second part of the proposal should be:

Programmer can choose whether or not Type[Y] should be a subtype of Type[X]. If he chooses to create this subtyping relationship, then type safety is ensured (i.e., subclass __init__ compatibility is checked, unlike at present).

gvanrossum commented 7 years ago

Canyou please use words instead of symbols? I can never remember which way <: or :> goes. (It's fine to edit the comment in place if GitHub let you.)

pkch commented 7 years ago

Done. To summarize, I think X.__init__ is relevant for type checking of class objects (type Type[X]) and irrelevant for type checking of instance objects (type X). My rationale is that:

X.__init__ signature defines the details of the Callable type that is used to represent class X, and that Callable type is nearly equivalent to Type[X]
while technically a programmer can call __init__ on an instance: X().__init__(), there are no good use cases for that

gvanrossum commented 7 years ago

Ah, that summary is very helpful. I agree that one shouldn't call __init__ manually (and in fact Python always reserves the right to have "undefined" behavior when you define or use dunder names other than documented).

But of course Type[X] has a bunch of behaviors that Callable[..., X] does not (e.g. it has class methods).

mitar commented 6 years ago

I have tried to read through this whole thread and I am maybe missing something, but I think I have a slight variation/use case to share to add to this discussion.

If you are familiar with sklearn you know that it provides many various machine learning classes. It is great because many classes share the same interface, so it is easy to programmers to use them. The issue is that while method names are the same, arguments they receive vary greatly. If I would like to define types for them in a meaningful way, there is a problem. The base class might be something like:

class Estimator(Generic[Inputs, Outputs]):
    @abstractmethod
    def fit(self, *, inputs: Inputs, outputs: Outputs) -> None:
        pass

    @abstractmethod
    def predict(self, *, inputs: Inputs) -> Outputs:
        pass

class SupervisedEstimator(Estimator[Inputs, Outputs]):
    pass

class UnsupervisedEstimator(Estimator[Inputs, Outputs]):
    @abstractmethod
    def fit(self, *, inputs: Inputs) -> None:
        pass

class GeneratorEstimator(Estimator[List[None], Outputs]):
    @abstractmethod
    def fit(self, *, outputs: Outputs) -> None:
        pass

class LupiSupervisedEstimator(SupervisedEstimator[Inputs, Outputs]):
    @abstractmethod
    def fit(self, *, inputs: Inputs, privileged_inputs: Inputs, outputs: Outputs) -> None:
        pass

So, somehow the idea is that Estimator defines which methods are available, and general arguments, but that then subclasses can define their own combination of arguments for their methods. And that then subclasses of those have to uphold the Liskov Substitution Principle. But one between Estimator and direct subclasses should not matter, because it should be clear that if you are expecting base Estimator, you might know which methods you can use, and what is semantics of those, but you have to inspect required arguments at runtime for an instance you have.

It seems this is not possible to express with current Python typing so that mypy would not complain?

JukkaL commented 6 years ago

@mitar Can you give a more complete example which illustrates the errors generated by mypy?

gvanrossum commented 6 years ago

It seems this is not possible to express with current Python typing so that mypy would not complain?

I can up with an example that seems to work:

from typing import *
T = TypeVar('T')

class Base:
    fit: Callable

class Foo(Base):
    def fit(self, arg1: int) -> Optional[str]:
        pass

class Bar(Foo):
    def fit(self, arg1: float) -> str:
        pass

Maybe this pattern helps? This seems to be about the best you can do given that you don't want the base class to specify the method signatures.

mitar commented 6 years ago

Interesting. I never thought of just specifying Callable. This is very useful.

My ideal typing semantics would be: this is a method signature for fit with arguments inputs and outputs. If your subclass fit uses these arguments, they have to match base fit types, but they can also be specified as omitted (by not listing them in subclass fit arguments).

Sadly, Optional is not useful here because it is not that a subclass is saying you can pass an argument or not, but that you have to pass it always, or you cannot pass it ever. (So different subclasses would choose one or the other.)

Then consumers of those objects could check and say: does this method accepts inputs, if yes, then I know what is the type and semantics of this argument. And same for outputs.

python / typing

Allow subclassing without supertyping #241