python / typing

Python static typing home. Hosts the documentation and a user help forum.
https://typing.readthedocs.io/
Other
1.61k stars 242 forks source link

Kill __subclasscheck__ #136

Closed gvanrossum closed 8 years ago

gvanrossum commented 9 years ago

Mark Shannon wants me to drop __subclasscheck__ from all type objects (Any, Union etc.). This is somewhat major surgery since it is used by Union simplification and for various type checks. See also #133 and #135.

ilevkivskyi commented 9 years ago

This is a bit sad. People would want to do some interactive experimentation to see what is issubclass(int, Union[int, str]), but probably there are some good reasons behind this decision.

1st1 commented 9 years ago

Maybe it is a good thing, at least for 3.5. This will effectively force type objects to be used only in annotations, which gives us more room to tweak their behaviour before 3.6.

ilevkivskyi commented 9 years ago

Maybe one could introduce a helper function acceptable(obj, hint) in typing.py that tells whether obj is OK for type hint. Like acceptable(42, Union[int, str]) == True.

gvanrossum commented 9 years ago

Here's what Mark wrote about the subject:

 My remaining concern is the isinstance and issubclass support.
There is a real different difference between whether a value is a member of a type and whether it is an instance of a class.

Support for isinstance and issubclass applied to types just breeds confusion.

As you said, for most users, the distinction between types and classes is subtle. But I think it is important to make people aware that there is a distinction, even if they fully understand it.

For example,
List[int] and List[str] and mutually incompatible types, yet
isinstance([], List[int]) and isinstance([], List[str))
both return true.

Just delete all the __instancecheck__ and __subclasscheck__ methods and I'll be happy.
1st1 commented 9 years ago

Maybe one could introduce a helper function acceptable(obj, hint) in typing.py that tells whether obj is OK for type hint. Like acceptable(42, Union[int, str]) == True.

I think that we should leave the task of validating types to external tools (and compilers), not user code. So I'd be -1 on this.

gvanrossum commented 9 years ago

The equivalent of issubclass() should probably be called is_consistent_with() (or some variation of that) to match the terminology used by Jeremy Siek in his Gradual Typing blog post (http://wphomes.soic.indiana.edu/jsiek/what-is-gradual-typing/). I'm not sure what to call the isinstance() equivalent, perhaps is_acceptable_for(). But I plan to ask others to contribute such functions.

ilevkivskyi commented 9 years ago

The intended use would be not to validate types in code but just do some quick interactive tests in REPL. But OK, maybe you are right people anyway will start using it for validation.

ilevkivskyi commented 9 years ago

@gvanrossum, Indeed, to have two such functions would be very cool. Even their presence would underline the distinction between the classes and types.

NYKevin commented 9 years ago

If we're drawing this type/class distinction, PEP 483 should be updated:

We use the terms type and class interchangeably, and we assume type(x) is x.class.

Should I report that as a separate bug?

gvanrossum commented 9 years ago

Here is fine. I've made some updates to PEP 483; it really does use the two words interchangeably, and I don't want to have to update the text too much, but I elaborated that particular sentence (and removed the thing about class, which is irrelevant).

agronholm commented 9 years ago

I'm confused -- as Python 3.5 is now in RC, does this mean the typing types were really left with only __subclasscheck__ working and not __isinstance__ ? I had a working runtime type checker (https://gist.github.com/agronholm/00311c169527442ddd5b) which relied on isinstance() but it broke at some point when __instancecheck__ was disabled. Should I not rely on even __subclasscheck__ then?

gvanrossum commented 9 years ago

You should not rely on subclass checking either. I ran out of time, but PEP 484 is accepted provisionally, which means we can still change the API in later 3.5.x releases or in 3.6.

agronholm commented 9 years ago

Ok, so with both __instancecheck__ and __subclasscheck__ out of commission, how am I to implement a runtime type checker? I would need to be able to somehow detect that a type, say Dict[str, int] is derived from plain Dict, but I can't find any way to do that since issubclass(Dict[str, int], Dict) returns False. Likewise, I'd like to make use of the __extra__ argument to automatically check against the appropriate ABC when one is available, but I'm hesistant to just do getattr(obj, '__extra__', None) since __extra__ could be anything outside the typing module.

gvanrossum commented 9 years ago

The idea is that you would have to write your own functions similar to isinstance() and issubclass() to introspect the objects created at run-time by type annotations. You are right however that the introspection interface has not yet been specified -- I think it would have to be a separate PEP. I would not rely on the __extra__ attribute unless you're willing to see it disappear at any point in the future -- possibly as soon 3.5.1. (However, the reasoning about __extra__ possibly being something else is somewhat faulty -- non-stdlib code should never use dunder names for anything other than their documented meaning, as the stdlib can at any point define a new dunder name without any concern for existing occurrences of that dunder name in user code. So non-stdlib code that uses __extra__ is already broken.)

agronholm commented 9 years ago

Alright, so I guess this means runtime type checking has to wait until that introspection interface has been specified. FWIW, here's what I had before I ran into this problem: https://gist.github.com/agronholm/b06bdba54c73d9b2f1f1

gvanrossum commented 9 years ago

I do encourage you to continue along this path, assuming the introspection API won't be terribly different from what is currently implemented. (In fact I think extra is probably the only thing that might significantly change, as it isn't very principled.) Instead of issubclass() and isinstance() you'll have to code your own, but looking at parameters etc. sounds about right. Have you thought about what to do about requiring that type variables conform across parameters and return values yet? E.g. for def find(x: T, xs: Iterable[T]) -> int: ...; you should allow find(42, [1, 2, 3]) and even find(0, []) but you should reject find(42, ['a', 'b', 'c']) and find(0, [1, 2, 'x']).

agronholm commented 9 years ago

I had thought about type variable checking but I hadn't gotten that far yet. The problem I was stuck with is that if I encounter a type hint like List[int], how am I supposed to programmatically determine that it should be treated as a List? The original class (plain List) does not show up on the MRO so I can't use isinstance() or issubclass() to determine the necessary checking semantics. The only solution I could find was using issubclass() on the undocumented __origin__ attribute, but I am again hesitant to use something that is undocumented.

gvanrossum commented 9 years ago

The best I can recommend is using __origin__ -- if we were to change this attribute there would still have to be some other way to access the same information, and it would be easy to grep your code for occurrences of __origin__. (I'd be less worried about changes to __origin__ than to __extra__.) You may also look at the internal functions _gorg() and _geqv() (these names will not be part of any public API, obviously, but their implementations are very simple and conceptually useful).

agronholm commented 9 years ago

I'll look at them -- thanks!

agronholm commented 8 years ago

OK, delivered: https://pypi.python.org/pypi/typeguard

gvanrossum commented 8 years ago

See also http://bugs.python.org/issue26075 (Union[str, Iterable[int]] returns Iterable[int], which is wrong!)

prechelt commented 8 years ago

I have just uploaded the new version of my runtime type checker, typecheck-decorator 1.3, to PyPI, now with 'typing' support: https://pypi.python.org/pypi/typecheck-decorator/

Compared to typeguard 1.1.1, it has broader support (except for Callable, which is yet missing) and a more precise documentation of its semantics and limitations.

I struggled enormously with the semantics of TypeVars, because there are several strange issues in the current typing.py. For instance, tg.ItemsView has three generic parameters (tg.T_co, tg.KT, tg.VT_co), the first of which represents the tg.Tuple[tg.KT, tg.VT_co] returned by each call to the iterator -- but how, in general (that is, for user-defined types), is a poor type checker to know this? Should I report this? (I found other bugs/oddities, but those have to do with __subclasscheck__...)

gvanrossum commented 8 years ago

Hm, I think this is a bug in the definition of ItemsView in typing.py. I filed https://github.com/python/typing/issues/177 to track this.

bintoro commented 8 years ago

So, I've been looking at the papers on gradual typing, and it looks like the is-consistent-with relation is simply the gradual typing version of type equality. That is, X is consistent with Y iff X is Y or either is Any.

The gradual typing counterpart to issubclass would be something like is-consistent-subtype-of. The idea is formalized in Siek & Taha's Gradual Typing for Objects, pp. 6–8 (Section 4).

I'm not opposed to flexible use of terminology, but as it stands, the PEP 483 definition of is-consistent-with is highly confusing, since it actually describes is-consistent-subtype-of.

Anyway, drawing this distinction suggests a way to preserve issubclass: just take out Any. Then, the issubclass operation for types would simply mean is-subtype-of, which is hardly controversial usage.

The gradual typing companions to issubclass would look roughly like this:

def is_consistent_with(t1, t2):
    return (t1 is t2 or
            t1 is Any or
            t2 is Any)

def is_consistent_subtype(t1, t2):
    return is_consistent_with(t1, t2) or issubclass(t1, t2)

Now, none of this addresses the criticism that

List[int] and List[str] and mutually incompatible types, yet isinstance([], List[int]) and isinstance([], List[str)) both return true.

This feels more like a bug in the implementation of __subclasscheck__ than a fundamental problem with the concept of subtypes. Why should list be considered a subclass of a parameterized List when the same is not true in general?

>>> class MyGeneric(Iterable[T]): pass
... 
>>> m = MyGeneric()
>>> isinstance(m, MyGeneric)
True
>>> isinstance(m, MyGeneric[int])
False

Code that is interested in type-checking [1, 2, 3] against List[int] should pretty clearly be comparing type([1, 2, 3]) against the declared type's ultimate origin, not against a type-constrained version.

EDIT: This argument essentially suggests taking the patch proposed in #202 a step further and ignoring __extra__ for constrained generics. IOW, issubclass(list, typing.Sequence[Any]) would delegate to collections.abc.Sequence but issubclass(list, typing.Sequence[int]) would not.

gvanrossum commented 8 years ago

the gradual typing counterpart to issubclass would be something like is-consistent-subtype-of.

As you might have guessed I'm not really up to reading academic literature (neither are most Python devs). I got the definition of is-consistent-with from Siek's blog post (http://wphomes.soic.indiana.edu/jsiek/what-is-gradual-typing/) -- I guess the problem with the blog post is that it doesn't actually address subtyping. I'm happy to update PEP 483, nobody seems to read it anyway.

isinstance(m, MyGeneric)

You're reporting a bug in isinstance(), which is not supposed to be supported at all! The bug is that it doesn't consistently raise an exception.

but issubclass(list, typing.Sequence[int]) would not.

What should it return? Isn't this just another argument that issubclass() should also be disabled?

gvanrossum commented 8 years ago

Things would go quickest if you could suggest a patch for PEP 483. The repo is here: https://hg.python.org/peps/ (Don't use any of the github clones you may find, they are likely out of date.)

NYKevin commented 8 years ago

What should it return? Isn't this just another argument that issubclass() should also be disabled?

It's possible I'm badly misunderstanding bintoro, but it sounds like they're talking about tracing the variable (rather than the value) back to where it was declared/returned/otherwise-imbued-with-a-type and looking at its compile-time type to determine the result of issubclass(). But that sounds totally infeasible to me.

@bintoro Please correct me since I've probably misconstrued your words.

bintoro commented 8 years ago

You're reporting a bug in isinstance(), which is not supposed to be supported at all! The bug is that it doesn't consistently raise an exception.

I was under the impression that instance checks would be preserved for generics. If a custom generic is to be used like any other class, then presumably instance checks should work too. Otherwise applications would suddenly break when typing information is added to containers that are subject to an instance check somewhere.

IMO it would be best to delete __instancecheck__ and __subclasscheck__ from GenericMeta and let the standard implementation decide. Then everything would work normally: if it's in the MRO, it's a superclass.

I experimented with removing GenericMeta.__subclasscheck__ and inserting extras as actual base classes. I quite like the result. Take a container like this:

class MyMapping(collections.MutableMapping): ...

Add type information:

class MyMapping(typing.MutableMapping[K, V]): ...

Now, the following continues to work:

>>> m = MyMapping()
>>> isinstance(m, collections.MutableMapping)
True

This seems ideal. In the current scheme a typed MyMapping would have to inherit from both MutableMapping variants in order to be universally recognized as such.

An open question is whether it's desirable for issubclass(list, typing.Sequence) to continue to return true. Perhaps I haven't thought this through, but I don't readily see why this should be so. The typing collections are only intended as generic base classes, not as all-around replacements for their collections.abc counterparts.

That said, an easy way to support such subclass checks is to link the ABC registries. Then, whatever is registered on, say, collections.MutableSequence would be found through typing.MutableSequence as well.

EDIT: Code is here.

but issubclass(list, typing.Sequence[int]) would not.

What should it return? Isn't this just another argument that issubclass() should also be disabled?

The way I see it, it should return false. Sequence[int] is just a subclass of Sequence, one that has nothing to do with list.

@NYKevin That's not what I'm suggesting at all. I'm suggesting to remove complexity, not add to it. Whether C[Y] is a subclass of C[X] should only depend on whether the former is derived from the latter, not on the relationship between Y and X. That sort of analysis is useful, of course, but should probably live outside __subclasscheck__ because __subclasscheck__ is called at runtime and runtime types are not parameterized.

Things would go quickest if you could suggest a patch for PEP 483.

Will do, as soon as I figure out what to call the relation. I suppose that when subtyping is introduced, the subtype-augmented is-consistent-with actually replaces the original definition, so we can just call it that.

All that is needed, really, is to mention the subtyping issue, in order to explain why the blog post describes a very different set of rules than PEP 483. (That is the confusion that led me to the research papers.)

ilevkivskyi commented 8 years ago

In general, I like the direction where #207 goes but I think it is better to discuss the "specification", before one goes for an implementation. Here is my view of the situation:

1) In the context of this PEP we want to distinguish classes and types. Class of an object is simply obj.__class__, while the type of an object is a concept that is introduced by a typecheker, linter, IDE, etc. typing.py provides a standard API for such tools. Type t1 is called a subtype of t2 if t1 is acceptable where type t2 is expected. Since Python is very dynamic, we need a special type Any. It is acceptable everywhere, and a variable declared Any accepts all other types.

2) All special type constructs: TypeVar NewType Generic Union and Optional Intersection (I hope it will be added at some point) Type Any raise TypeError when appear in isinstance of issubclass. Only first two could be instantiated.

3) issubclass and isinstance are only allowed for generic types if they are used without parameters assert issubclass(list, Sequence), assert isinstance(lambda x: x, Callable). Since by agreement List without parameters means List[Any] and Any is a fallback to dynamic typing, this is consistent.

4) typing.py provides a helper function is_compatible(one_type, other_type), that returns True if one_type should be acceptable where other_type is expected. Currently this could be based on nominal subtyping, using declared variance of generic types, and semantics of special type constructs. Examples:

assert is_compatible(Any, Employee)
assert is_compatible(Manager, Employee)
assert is_compatible(Employee, Union[Employee, Stranger])
assert is_compatible(Tuple[Manager, Salary], Tuple[Employee, Salary])
assert is_compatible(Callable[[Employee], Salary], Callable[[Manager], Salary])

@gvanrossum , @markshannon do you agree with such formulations?

bintoro commented 8 years ago

issubclass and isinstance are only allowed for generic types if they are used without parameters

If you're describing #207, then that's not really accurate. issubclass and isinstance simply have their normal semantics when used with generics.

Consider:

class A(Generic[T]): ...

class B(A[int]): ...
>>> isinstance(B(), A[int])
True

This is possible because A[int] is B's superclass in a very real way.

If this is considered undesirable, then I would correct it by dropping the parameterized intermediate type from the MRO rather than messing with issubclass behavior. That is, instead of

>>> B.__mro__
(B[int], A[int], A[~T], typing.Generic[~T], typing.Generic, <class 'object'>)

we'd have

>>> B.__mro__
(B[int], A[~T], typing.Generic[~T], typing.Generic, <class 'object'>)
ilevkivskyi commented 8 years ago

There are two points why one might want to limit issubclass() only to generics without parameters.

  1. To avoid mixing types and classes, one should not use types (like List[int]) in runtime context, only classes (like B or A in your example) should be used in runtime context.
  2. It is probably easier for backward compatibility. One could simply add typing information in class definition and does not need to touch any already existing issubclass, etc. (as I understand, this will also work in your current implementation).

I think that your idea of removing A[int] from MRO is not OK. In such case issubclass will return False, but I would like it to raise TypeError. But I agree with you that we should reduce messing with __subclasschek__ to absolute minimum. Maybe we don't need this restriction.

In any case, do not hurry to make changes to your PR. We first need to get approval of the "specifications" from the guys I mentioned in my previous post.

By the way, I noticed a problem with the representation of B, it is not generic, so that B[int] has no sense. If such behavior is not caused by your changes, then I recommend you to open a separate issue on this.

bintoro commented 8 years ago

My thinking goes like this: if A[int] is considered enough of a class to appear in the MRO, then issubclass(B, A[int]) should be true; anything else would be just too weird. I mean, is there any precedent for issubclass(C, random.choice(C.__mro__)) occasionally raising a TypeError?

Another way to look at it would be to consider the A[int] specialization irrelevant for runtime behavior and insert only A in the MRO. The full parameterization information can still be preserved internally for type-checking tools to investigate.

It's either a bona fide class or a magic type — you can't have it both ways.

By the way, I noticed a problem with the representation of B, it is not generic, so that B[int] has no sense.

My bad, I was using the wrong version. This is how generics have been represented since 3.5.1: the parameters are propagated indefinitely.

ilevkivskyi commented 8 years ago

First, and very important note: Clearly you are working with outdated version of typing.py. Many crucial bugs have been fixed since 3.5.1 and the whole generics system has been thoroughly reworked by @gvanrossum . If I have

class A(Generic[T]):
    ...
class B(A[int]):
    ...

then with the last version at this github repo I get:

>>> str(B)
'__main__.B'

>>> A[int]().__class__ is A
True

>>> B.__mro__
(__main__.B, __main__.A<~T>[int], __main__.A<~T>, __main__.Generic<~T>, __main__.Generic, <class 'object'>) 

Now, concerning the question about issubclass, as you can see there is an additional reason (apart from two mentioned in my previous post) for limiting it only to generics without parameters: Type erasure:

>>> A[int]().__class__ is A
True

It is explained in the PEP lines 504-508 (again, the latest version of the PEP is always here, in this github repo. In case you didn't know, this PEP is still in the provisional status, i.e., it still could be edited).

Let me explain why I want TypeError instead of False. Normally, isinstance(1, Union) would just return False, but we want to clearly say "you are doing something completely wrong", therefore we raise TypeError. The same situation is here, by raising TypeError we tell the user: If you are asking questions like isinstance([], List[int]) you do not understand what is going on.

bintoro commented 8 years ago

Clearly you are working with outdated version of typing.py.

Looks like I was using 3.5.1 by mistake when I wrote the preceding comments. I'm perfectly aware of the recent changes, as #207 is based on them.

[...] as you can see there is an additional reason [...] for limiting it only to generics without parameters: Type erasure

Exactly. Since A[int] reduces to A at runtime, why should it appear as a distinct entity in the MRO?

The same situation is here, by raising TypeError we tell the user: If you are asking questions like isinstance([], List[int]) you do not understand what is going on.

Generics are supposed to be able to be used as normal classes. It seems rather precarious to introduce something in the MRO that cannot be subclass-checked against. Union is wholly different because it is not a concrete class at all and will never appear as a runtime type.

ilevkivskyi commented 8 years ago

Just to clarify, I am not against excluding A[int] form MRO, I am in favour of this idea. But, in addition to this (not in alternative to this) I would raise TypeError instead of returrning False.

bintoro commented 8 years ago

@ilevkivskyi It seems we've been having a slight miscommunication here. I am totally fine with raising TypeError in nonsensical cases as long as all(map(partial(issubclass, G), G.__mro__)) is true for every generic G.

ilevkivskyi commented 8 years ago

OK, it looks like in fact we have an agreement here. Now we need to wait for @gvanrossum and @markshannon , and their comments. At least one of them is quite busy now, because two releases of mypy are planned before May 26.

markshannon commented 8 years ago

I still want __subclasscheck__ dropped. To reiterate my position: Determining whether a class is a subclass of a type is meaningless as far I'm concerned. It is meaningful to check whether a type is a subtype of a type, but that is the job of a static checker and doesn't belong in the typing module.

ilevkivskyi commented 8 years ago

Determining whether a class is a subclass of a type is meaningless as far I'm concerned.

@markshannon , we both agree with this. The question is whether simply remove __subclasscheck__ method (in this case issubclass(B, A[int]) will return True following the normal mechanism), or have a __subclasscheck__ that raises TypeError in all meaningless cases.

bintoro commented 8 years ago

What does it even mean to "drop __subclasscheck__"? Not implement it or have it error out? The former is exactly how it's done in #207, while the latter would render generics unusable.

Just to be clear, nobody's suggesting to support __subclasscheck__ for Union or Any. The discussion is about generics.

gvanrossum commented 8 years ago

Why would making it error out render generics unusable?

ilevkivskyi commented 8 years ago

@gvanrossum , maybe "unusable" is to radical formulation from @bintoro . Consider backward compatibility, people often use issubclass. I could easily imagine something like assert issubclass(UserClass, FrameworkBase) all over the code. If someone now decides to add a generic information to FrameworkBase, like class FrameworkBase(Iterator[Task]): ..., then all these subclass checks will fail, so that all the code will be broken, until the logic is completely changed to avoid issubclass. This is against the idea of gradual typing: There should be a possibility to introduce it gradually :-)

We have Any for such purpose, so that I think that a reasonable compromise would be to allow issubclass for generics without parameters (remeber MyGeneric without parameters is MyGeneric[Any]) and raise TypeError in all other cases. In such way all existing subclass checks will work, and if people will try to change them to MyGeneric[int] they will get a note that they should not do this, because issubclass is not intended for work with types.

gvanrossum commented 8 years ago

Yes, that middle ground seems reasonable.

bintoro commented 8 years ago

Yeah, that's my concern.

Essentially, adding typing information to a container class would require the author to be sure that the class will never take part in an instance check anywhere. This seems like a pretty serious (and pointless!) restriction.

I don't see the theoretical grounds, either. The rationale for disallowing __subclasscheck__ for the magic types Union & co. is that they are not meaningful as runtime classes. As long as Generic is intended as a means of adding typing information into real, instantiable classes (not only stubs), it's clear that the same reasoning does not apply. Custom generics are fundamentally just regular classes with typing metadata tacked on.

There's also the practical issue of supporting isinstance(x, typing.Iterable) as a substitute for isinstance(x, collections.abc.Iterable) etc.

As discussed earlier today, raising TypeError from isinstance(x, MyGeneric[int]) makes sense, but in that case I feel that such types shouldn't be left in the MRO, either. If something's too magical to support instance checks, they should be eliminated from the MRO when the concrete class is constructed.

gvanrossum commented 8 years ago

They're in the MRO so they can be introspected by other tools than isinstance/issubclass.

ilevkivskyi commented 8 years ago

This is a good point, typecheckers could use MRO for their purpose. Also probably it is easier to implement, no need to manually construct MRO, just modify __subclasscheck__ and __instancecheck__ to raise TypeError in the cases discussed. If @gvanrossum has no other objections, I think it would be great if @bintoro will make a clean PR implementing the changes discussed above. This would close many issues, but also it is quite extensive surgery, so be careful :-)

bintoro commented 8 years ago

They're in the MRO so they can be introspected by other tools than isinstance/issubclass.

Yes, but that information can just as well be preserved internally like __origin__ and friends.

I used this example earlier:

class A(Generic[T]):
    ...

class B(A[int]):
    ...

and suggested that A[int] should collapse to just A in B.__mro__. The argument is twofold:

From a practical POV, it's going to be very unexpected to have something in the MRO that blows up when it appears in a subclass check.

From a purity POV, this whole debate is rooted in the distinction between types and classes, and this is exactly it. A[int] is a type, not a class, so what's it doing in the MRO of a concrete class? The runtime base is just A.

(None of this is an important issue to me, but it does seem like a logical conclusion of what's happening with __subclasscheck__.)

gvanrossum commented 8 years ago

I think I'll wait until 3.5.2 is released to get back to this.

bintoro commented 8 years ago

@ilevkivskyi It would be pretty easy to incorporate this in #207, I think.

I'll add one more note here. Assuming parameterized generics now become types (non-classes), then something that needs discussion is what to do about instantiating them.

There are examples of this in the PEP:

IntNode = Node[int]
StrNode = Node[str]
p = IntNode()
q = StrNode()

If Node[int] isn't a class, and instance checking against it is deemed nonsensical, then instantiating it should presumably be an error as well. Subclassing becomes a required step:

class IntNode(Node[int]):
    pass

p = IntNode()
gvanrossum commented 8 years ago

Those examples should work.

However, even though this would be equivalent at run time:

p = Node[int]()

that's disallowed by the PEP and indeed mypy reports it as an error (but allows the examples from the PEP).

That's all intentional, and the implementation somehow has to deal with it.