python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.22k stars 2.78k forks source link

TypeForm[T]: Spelling for regular types (int, str) & special forms (Union[int, str], Literal['foo'], etc) #9773

Open davidfstr opened 3 years ago

davidfstr commented 3 years ago

(An earlier version of this post used TypeAnnotation rather than TypeForm as the initially proposed spelling for the concept described here)

Feature

A new special form TypeForm[T] which is conceptually similar to Type[T] but is inhabited by not only regular types like int and str, but also by anything "typelike" that can be used in the position of a type annotation at runtime, including special forms like Union[int, str], Literal['foo'], List[int], MyTypedDict, etc.

Pitch

Being able to represent something like TypeForm[T] enables writing type signatures for new kinds of functions that can operate on arbitrary type annotation objects at runtime. For example:

# Returns `value` if it conforms to the specified type annotation using typechecker subtyping rules.
def trycast(typelike: TypeForm[T], value: object) -> Optional[T]: ...

# Returns whether the specified value can be assigned to a variable with the specified type annotation using typechecker subtyping rules.
def isassignable(value: object, typelike: TypeForm[T]) -> bool: ...

Several people have indicated interest in a way to spell this concept:

For a more in-depth motivational example showing how I can use something like TypeForm[T] to greatly simplify parsing JSON objects received by Python web applications, see my recent thread on typing-sig:

If there is interest from the core mypy developers, I'm willing to do the related specification and implementation work in mypy.

hauntsaninja commented 3 years ago

Why not do something like:

from __future__ import annotations
from typing import *

T = TypeVar("T")

class TypeAnnotation(Generic[T]):
    @classmethod
    def trycast(cls, value: object) -> T:
        ...

reveal_type(TypeAnnotation[Optional[int]].trycast(object()))
JelleZijlstra commented 3 years ago

Why can't we make Type[T] also work for other special forms?

@hauntsaninja's workaround is useful, but it would be better to have a feature in the core type system for this.

davidfstr commented 3 years ago

@hauntsaninja , using your workaround I am unable to access the passed parameter at runtime from inside the wrapper class.

The following program:

from __future__ import annotations
from typing import *

T = TypeVar('T')

class TypeAnnotation(Generic[T]):
    @classmethod
    def trycast(cls, value: object) -> Optional[T]:
        Ts = get_args(cls)
        print(f'get_args(cls): {Ts!r}')
        return None

ta = TypeAnnotation[Union[int, str]]
print(f'get_args(ta): {get_args(ta)!r}')
result = ta.trycast('a_str')

prints:

get_args(ta): (typing.Union[int, str],)
get_args(cls): ()
davidfstr commented 3 years ago

@JelleZijlstra commented:

Why can't we make Type[T] also work for other special forms?

@hauntsaninja's workaround is useful, but it would be better to have a feature in the core type system for this.

I agree that having Type[T] be widened to mean "anything typelike, including typing special forms" would be an alternate solution. A very attractive one IMHO.

However it appears that there was a deliberate attempt in mypy 0.780 to narrow Type[T] to only refer to objects that satisfy isinstance(x, type) at runtime. I don't understand the context for that decision. If however that decision was reversed and Type[T] made more general then there would be no need for the additional TypeAnnotation[T] syntax I'm describing in this issue.

gvanrossum commented 3 years ago

I believe the issue is that Type is used for things that can be used as the second argument of isinstance(). And those things must be actual class objects (or tuples of such) -- they cannot be things like Any, Optional[int] or List[str].

So if this feature is going to happen I think it should be a separate thing -- and for the static type system it probably shouldn't have any behavior, since such objects are only going to be useful for introspection at runtime. (And even then, how are you going to do the introspection? they all have types that are private objects in the typing module.)

ltworf commented 3 years ago

Well I wrote this module with various checks and add every new major py version to the tests to see that it keeps working: https://github.com/ltworf/typedload/blob/master/typedload/typechecks.py

Luckily since py3.6 it has not happened that the typing objects change between minor upgrades of python.

davidfstr commented 3 years ago

I believe the issue is that Type is used for things that can be used as the second argument of isinstance(). And those things must be actual class objects (or tuples of such) -- they cannot be things like Any, Optional[int] or List[str].

Makes sense.

for the static type system it probably shouldn't have any behavior, since such objects are only going to be useful for introspection at runtime.

Agreed.

how are you going to do the introspection? they all have types that are private objects in the typing module.

The typing module itself provides a few methods that can be used for introspection. typing.get_args and typing.get_origin come to mind.

davidfstr commented 3 years ago

I could see a couple of possible spellings for the new concept:

Personally I'm now leaning toward TypeForm (over TypeAnnotation) because it is consistent with prior documentation and is more succinct to type. It does sound a bit abstract but I expect only relatively advanced developers will be using this concept anyway.

(Let the bikeshedding begin. :)

gvanrossum commented 3 years ago

I like TypeForm.

davidfstr commented 3 years ago

Okay I'll go with TypeForm then, for lack of other input.

Next steps I expect are for me to familiarize myself with the mypy codebase again since I'm a bit rusty. Hard to believe it's been as long as since 2016 I put in the first version of TypedDict. Rumor is it that the semantic analyzer has undergone some extensive changes since then.

gvanrossum commented 3 years ago

Good. And yeah, a lot has changed. Once you have this working we should make a PEP out of it.

davidfstr commented 3 years ago

Once you have this working we should make a PEP out of it.

Yep, will do this time around. :)

davidfstr commented 3 years ago

Update: I redownloaded the high-level mypy codebase structure this afternoon to my brain. It appears there are now only 4 major passes of interest:

Next steps I expect are to trace everywhere that mypy is processing occurrences of Type[T] and T = TypeVar('T'), which I expect to be most-similar in implementation to the new TypeForm[T] support.

davidfstr commented 3 years ago

Update: I have found/examined all mypy code related to processing the statement T = TypeVar('T'). Briefly:

Next steps I expect are to trace everywhere that mypy is processing occurrences of Type[T] and other references to a T (which a TypeVar assignment statement defines).

davidfstr commented 3 years ago

Update: I did trace everywhere that mypy is processing occurrences of Type[T], and more specifically uses of TypeType. There are a ton!

In examining those uses it looks like the behavior of TypeForm when interacting with other type system features is not completely straightforward, and therefore not amenable to direct implementation. So I've decided to take a step back and start drafting the design for TypeForm in an actual PEP so that any design issues can be ironed out and commented on in advance.

Once it's ready, I'll post a link for the new TypeForm PEP draft to here and probably also to typing-sig.

gvanrossum commented 3 years ago

Yeah, alas Type[] was not implemented very cleanly (it was one of the things I tried to do and I missed a lot of places). We do have to consider whether this is going to be worth it -- there are much more important things that require our attention like variadic generics and type guards.

davidfstr commented 3 years ago

We do have to consider whether this is going to be worth it -- there are much more important things that require our attention like variadic generics and type guards.

Aye. Variadic generics and type guards both have much wider applicability than TypeForm in my estimation. Python 3.10's upcoming alpha window from Feb 2021 thru April is coming up fast, and only so many PEPs can be focused on.

Nevertheless I'll get the initial TypeForm PEP draft in place, even if it needs to be paused ("deferred"?) for a bit.

davidfstr commented 3 years ago

Update: I've drafted an initial PEP for TypeForm.

However I was thinking of waiting to post the TypeForm PEP for review (on typing-sig) until the commenting on PEP 646 (Variadic Generics) slows down and it becomes soft-approved, since that PEP is consuming a lot of reviewer time right now and is arguably higher priority.

In the meantime I'm building out an example module (trycast) that plans to use TypeForm.

davidfstr commented 3 years ago

Update: I'm still waiting on Variadic Generics (PEP 646) on typing-sig to be soft-approved before posting the TypeForm PEP draft, to conserve reviewer time.

(In the meantime I'm continuing to work on trycast, a new library for recognizing JSON-like values that will benefit from TypeForm. Trycast is about 1-2 weeks away from a beta release.)

gvanrossum commented 3 years ago

I expect PEP 646 to be a complex topic to soft-approve, given the complexity of implementation, so I recommend not blocking on that for too long (though waiting a little while longer is fine).

davidfstr commented 3 years ago

I've posted the draft of the TypeForm PEP to typing-sig for review and discussion at: https://mail.python.org/archives/list/typing-sig@python.org/thread/7TDCBWT4RAYDJUQJ3B5NKXTQDUO5SIW2/

davidfstr commented 3 years ago

Update: Did give a presentation about TypeForm at PyCon US 2021 with representatives from most type checkers attending. The folks in the room (including Jukka from mypy) were leaning toward broadening Type[T] to also match non-class types (and type annotation objects in general) in preference to introducing an entirely new form like TypeForm[T]. This would avoid the need to introduce a new spelling that must be remembered (TypeForm) and likely be easier to implement. It sounded like some type checkers other than mypy already have this behavior.

So the next steps I foresee is drafting a proposal to broaden Type[T] as defined by PEP 484 to accept any type annotation object and not just class objects.

On my own plate I'm still pushing the implementation of PEP 655, and I'm planning to switch back to TypeForm discussions/work after that is complete.

wyfo commented 3 years ago

The folks in the room (including Jukka from mypy) were leaning toward broadening Type[T] to also match non-class types (and type annotation objects in general) in preference to introducing an entirely new form like TypeForm[T].

This change would "break" some of my code. typing.Type was designed as a wrapper of builtin type, but type has some characteristics (__name__, __mro__, etc.) that are not present in all possible types, e.g. typing.TypeVar, typing.NewType or generic aliases. So if I have variable annotated with Type on which I access __mro__, this modification of PEP 484 would make this access unsafe, and I would have to add # type: ignore everywhere I use my variable as a type. (I know I could use type instead of Type, especially since Python 3.9, but my code must be Python 3.6-compatible, and having a different meaning between type and Type is kind of awkward to me)

I don't really like this kind of modification with breaking impact. By the way,typing.Type is supposed to be deprecated since Python 3.9 because of PEP 585. This would mean to remove the deprecation.

Was this issue addressed during the PyCon?

(By the way, I've always wanted to say that I find AnyType more explicit than TypeForm)

ltworf commented 3 years ago

@wyfo do not rely on dunder methods, as they can (and do) change between versions; in some cases even between minor releases.

See the links. The internal APIs of the typing stuff has had a lot of changes in the various releases. If the specific stuff you used happened to not be changed I'd attribute it more to luck than anything else.

https://github.com/ltworf/typedload/blob/1.19/typedload/typechecks.py#L105-L110

https://github.com/ltworf/typedload/blob/master/typedload/typechecks.py#L63-L67

https://github.com/ltworf/typedload/blob/master/typedload/typechecks.py#L79-L86

https://github.com/ltworf/typedload/blob/master/typedload/typechecks.py#L99-L108

wyfo commented 3 years ago

I know that typing API is very instable (i've written myself adaptors for cross-version typing.get_type_hints, typing.get_origin, typing.get_args, etc.), however, my point is not about typing API, it's about builtin object type and its instances.

And I do think all dunder attributes (__mro__, __init__, etc.) described in the data model documentation are pretty stable. So when I have a Type object instance, a.k.a. a class, I can expect to have __mro__ available; it would not be the case with an instance of NewType. There is no luck here.

gvanrossum commented 3 years ago

Yup. And this is why I still prefer keeping Type[] for things that are actual class objects and introducing TypeForm[] for other things that have concrete classes defined in typing.py.

On Thu, May 27, 2021 at 2:20 AM wyfo @.***> wrote:

I know that typing API is very instable (i've written myself adaptors for cross-version typing.get_type_hints, typing.get_origin, typing.get_args, etc.), however, my point is not about typing API, it's about builtin object type and its instances.

And I do think all dunder attributes (mro, init, etc.) described in the data model documentation https://docs.python.org/3/reference/datamodel.html are pretty stable. So when I have a Type object instance, a.k.a. a class, I can expect to have mro available; it would not be the case with an instance of NewType. There is no luck here.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python/mypy/issues/9773#issuecomment-849479377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWCWMQ745BQ4LZ62QMDS43TPYFG7ANCNFSM4UIO3ZYA .

-- --Guido van Rossum (python.org/~guido)

davidfstr commented 3 years ago

By the way,typing.Type is supposed to be deprecated since Python 3.9 because of PEP 585. This would mean to remove the deprecation.

Was this issue addressed during the PyCon?

If the meaning of Type[T] was changed I would presume that type[T] would also change since the latter is an alternative spelling for the former.

gvanrossum commented 3 years ago

On Thu, May 27, 2021 at 7:52 PM David Foster @.***> wrote:

By the way,typing.Type is supposed to be deprecated since Python 3.9 because of PEP 585. This would mean to remove the deprecation.

Was this issue addressed during the PyCon?

If the meaning of Type[T] was changed I would presume that type[T] would also change since the latter is an alternative spelling for the former.

Indeed.

-- --Guido van Rossum (python.org/~guido)

wyfo commented 3 years ago

If the meaning of Type[T] was changed I would presume that type[T] would also change since the latter is an alternative spelling for the former.

But I don't think we can change the meaning of type[T], because type is a builtin object; changing it would be like saying that list[T] could be something else than a list.

gvanrossum commented 3 years ago

But I don't think we can change the meaning of type[T], because type is a builtin object; changing it would be like saying that list[T] could be something else than a list.

Actually it works just fine. You can define a function with an argument whose type is said to be type:

 def f(t: type): ...

and you can call it with e.g. a union type:

f(int | str)

Inside f we could introspect the argument and pick it apart, e.g.:

import types

def f(t: type):
    match t:
        case types.Union(__args__=a):
            print(a)

And the above call f(int | str) would print this:

(<class 'int'>, <class 'str'>)

Adding a typevar to the annotation makes no difference.

wyfo commented 3 years ago

However:

def print_mro(tp: type):
    print(tp.__mro__)  # currently, it's valid because `tp` must be a `type` instance, so it has `__mro__` attribute

print_mro(int)
#> (<class 'int'>, <class 'object'>)
print_mro(int | str)
#> AttributeError: 'types.Union' object has no attribute '__mro__'

That's my point. type is a class, just as much as list is; this class defines some attributes/methods that I would expect to be able to access if I have an object annotated with type.

On the contrary, if I have an object annotated with TypeForm, I know it will be either a class (a type instance), a NewType, a types.Union, a types.GenericAlias/typing.GenericAlias, or None, and I will be able to use pattern matching to process it.

gvanrossum commented 3 years ago

However:

def print_mro(tp: type):
    print(tp.__mro__)  # currently, it's valid because `tp` must be a `type` instance, so it has `__mro__` attribute

print_mro(int)
#> (<class 'int'>, <class 'object'>)
print_mro(int | str)
#> AttributeError: 'types.Union' object has no attribute '__mro__'

That's my point. type is a class, just as much as list is; this class defines some attributes/methods that I would expect to be able to access if I have an object annotated with type.

On the contrary, if I have an object annotated with TypeForm, I know it will be either a class (a type instance), a NewType, a types.Union, a types.GenericAlias/typing.GenericAlias, or None, and I will be able to use pattern matching to process it.

Yes, this is why I still prefer the separate TypeForm[] proposal.

wrobell commented 2 years ago

I believe the issue is that Type is used for things that can be used as the second argument of isinstance(). And those things must be actual class objects (or tuples of such) -- they cannot be things like Any, Optional[int] or List[str].

IMHO, the problem is the terminology itself.

Python documentation does not seem to make the distinction. The definition seems to be written in "Built-in Types"

The principal built-in types are numerics, sequences, mappings, classes, instances and exceptions.

The typing module documentation says

The most fundamental support consists of the types Any, Union, Callable, TypeVar, and Generic

(emphasis mine)

But Generic or TypeVar entities are not really types. For example, I cannot use them with singledispatch anymore, so I call them pseudo-types.

IMHO, it is very important to correct the documentation when TypeForm is introduced. The isinstance() is a test, but better terminology is needed. The fact, that there is the distinction, makes it hard to reason about type system in Python, and it is much harder when it is implicit.

glyph commented 2 years ago

Thank for articulating a concise summary of the semantic problem here, @wrobell! I agree that the terms "class" and "type" are used very muddily throughout the python & mypy documentation, and it would be great to have some more precise jargon. Perhaps adjectives (AbstractType and ConcreteType, with ConcreteType being an alias for Type? Or for more literal naming, MypyType and Class ?)

gvanrossum commented 2 years ago

@glyph:

Thank for articulating a concise summary of the semantic problem here, @wrobell! I agree that the terms "class" and "type" are used very muddily throughout the python & mypy documentation, and it would be great to have some more precise jargon. Perhaps adjectives (AbstractType and ConcreteType, with ConcreteType being an alias for Type? Or for more literal naming, MypyType and Class ?)

Yeah, the terminology is terrible. :-( In Python 0 and 1, 'class' was for things defined with the 'class' statement, while 'type' referred to built-in types (aka extension types) like 'int'. At some point in Python 2 (starting 2.1 IIRC) we unified these concepts but kept both names ase synonyms -- so we have type(x) and x.class returning the same thing, which is sometimes called class, and sometimes type. (It's possible to override class so it returns something different, but that is not related to the confusion we have here.) When PEP 484 was drafted, Mark Shannon (its BDFL-Delegate) rightly identified that we were being confusing, and we agreed to use the following distinction in the PEP: "class" is what you see at runtime, whereas "type" refers to what the static type checker reasons about. This worked for PEP 484, because it mostly concerns itself with static types. (The "Type" special form is an exception -- it intentionally only refers to what the PEP calls "class", but it's called "Type" since it parallels the type() builtin.)

Clearly these two conventions clash, and when the documentation for the typing module was (eventually) written, the terminology was neither carefully explained nor consistently applied. (I don't know about the mypy docs -- I would assume it's basically always talking about static types whenever it says "type", but this might be too subtle if you're used to the runtime meaning of that word in the Python docs outside the typing module.)

We now have two terms ("type" and "class") meaning the same thing throughout most of the Python docs, and no good word left to call the runtime representation of (all) static types. I don't really like concrete vs. abstract type, since the latter is too close to abstract base class, where the term "abstract" has a totally different meaning. And I don't like directly referring to mypy, since there are now many static type checkers. I also don't think we can fix this by introducing a new distinction between class and type everywhere. And using "static type" also doesn't sound right, especially since the key use would be to refer to the runtime representation of static types...

So what can we do? Could we start using "type form" in cases where the distinction is important? Linguistically it's hard to see how we could say "the full term is 'type form' but it can be abbreviated to 'type' when the meaning is clear from context" -- there are many situations where one can leave out a preceding adjective based on context (e.g. mobile phone -> phone), but a trailing noun...?

The best I can think of right now is not very good -- we could call things that static type checkers consider static types but aren't properly types at runtime "extended types"; when the distinction must be made, the regular kind of types (a.k.a. classes) could be called "runtime types". So we could write for example:

Any, list[int], and T (where T is defined using TypeVar()) are examples of extended types, but they aren't considered runtime types; for example, they cannot be used as the second argument to isinstance().

PS. Possibly related: https://bugs.python.org/issue45665, where @serhiy-storchaka argues that we made a mistake by making isinstance(list[int], type) return True. (Not to be confused with using list[int] as the second argument, as above.)

wrobell commented 2 years ago

I would simply go back to "annotation type"

gvanrossum commented 2 years ago

Oh, that’s much better! I’d spell it in full, AnnotationType even though it’s longer. (We are culturally trying to reduce our use of abbreviations.)

gvanrossum commented 2 years ago

And the regular types could be called “classic types” (pun intended).

It would be nice to also have a good term for those annotation types that aren’t classic types, but I can’t think of anything good that works for Any, T, and list[int].

wrobell commented 2 years ago

The term "classic classes" was used here, so I find "classic types" confusing. :)

I would really keep it as simple as possible

issubclass raises error for Any, T, and list[int], so they are annotation types (with the caveat that type erasure happens for list[int] annotation type, so it becomes list type at runtime).

BTW. It would be great if Python had a rule - an annotation type can become a type, stays this way, and never goes back. We might have another unification of types and annotation types in the future, just step by step. ;)

gvanrossum commented 2 years ago

I predict we'll need terms for runtime types for clarification in some places, maybe one of the adjectives "regular" or "runtime" or "plain".

Do you have an example of a situation where we broke that rule about annotation types becoming type and then going back again? Do you anticipate a specific unification in the future?

wrobell commented 2 years ago

The example, when Python went the other way, IMHO: https://bugs.python.org/issue34498

Would Python 3.11 speed-up improvements trigger some unification? For example, could having list[int] as a type help with specialization? But I am going way out of my expertise here...

My primary use case would be single dispatch on generic types.

wrobell commented 2 years ago

Also, what would be benefit of using type over AnnotationType/TypeForm?

My understanding is, both definitions are allowed

def f1(t: type[int]) -> int: ...
def f1(t: AnnotationType[int]) -> int: ...

This is not allowed

def f2(t: type[Optional[int]]) -> int: ...

and needs to be

def f2(t: AnnotationType[Optional[int]]) -> int: ...

If there is no benefit of using type, then the following is ok?

Type = AnnotationType

def f1(t: Type[int]) -> int: ...
def f2(t: Type[Optional[int]]) -> int: ...
wyfo commented 2 years ago

Also, what would be benefit of using type over AnnotationType/TypeForm?

type implies __mro__, __suclasses__, etc. Also, type[int] means int or a subclass of int, but I'm not sure of what would mean AnnotationType[int] (only int would be allowed ?).

And I'm not sure that someone will ever write AnnotationType[int]. As @davidfstr wrote in this "pitch", it would mainly be used in its generic form AnnotationType[T] (and sometimes without type vars). By the way, maybe AnnotationType.__class_getitem__ could be constrained to only accept type vars, to be less confusing.

gvanrossum commented 2 years ago

The example, when Python went the other way, IMHO: https://bugs.python.org/issue34498

That was long ago, when all typing stuff was provisional.

Would Python 3.11 speed-up improvements trigger some unification? For example, could having list[int] as a type help with specialization? But I am going way out of my expertise here...

No, that's not where we're going at all with the speedup project -- we don't look at annotations (looking at the types you actually get is much more effective). Even if we were (and tools like mypyc certainly are) I'm not sure what you mean with "unification" in this case?

What would it even mean that list[int] "is a type"? Surely you don't want to propose creating a new type object for each distinct list[X]? Type objects are behemoths.

My primary use case would be single dispatch on generic types.

If the topic is singledispatch, I don't have an opinion (at least not one that matters) since I've never used it (even though I recall once implementing a toy version).

wrobell commented 2 years ago

"unification" - an annotation type could become a Python runtime type in the future. I am not proposing any solution. I do understand that problem is complex and it is hard to solve in a comprehensive and performant way.

I mentioned "unification" to keep "annotation type -> type" transition an open possibility. It might help others to accept the need for these two concepts in Python type system, at the moment.

IMHO, Python needs AnnotationType to make the whole type system more consistent and easier to reason about. singledispatch is a prime example why. The bug, linked by me, is known for over three years (4 main Python versions). It is still open. Mypy does not raise a problem if first argument has annotation type list[int] till today.

gvanrossum commented 2 years ago

I suppose you're talking about https://bugs.python.org/issue34498? What do you want to happen? I suppose you'd like to be able to register things like list[int] or NewType? That's not likely going to happen -- this unification of which you speak would be very difficult to pull off, regardless of how many Python versions lack support for it. You may be better off trying to file a bug with typeshed -- maybe changing the annotation of the register() argument from Type[Any] to Type[object] would make mypy diagnose this? (But give it a try first.)

Tinche commented 2 years ago

FWIW cattrs was originally built on just singledispatch, but since it needs to also support a ton of what we're now calling AnnotationTypes it's moved to a system of tiered matching, first checking a singledispatch and then checking a list of predicates.

Even if singledispatch somehow grew support for AnnotationTypes, I'd still need the predicates since they are an order of magnitude more powerful (how do you detect a dataclass or attrs class with singledispatch, or a union but only of primitive types and not user classes?). And if these cases strike you as far fetched, I can vouch that you'd run into them within the first day of writing a serialization library ;)

wrobell commented 2 years ago

I suppose you're talking about https://bugs.python.org/issue34498? What do you want to happen? I suppose you'd like to be able to register things like list[int] or NewType?

IMHO, the bug should be closed as won't fix as the reality is that singledispatch can support Python runtime types only, at the moment. I would like Python type system to be explicit, whatever its limitations. [...]

gvanrossum commented 2 years ago

I don't disagree. Can you put that in the bug, with some explanation?

wrobell commented 2 years ago

I don't disagree. Can you put that in the bug, with some explanation?

I see it is closed already, but I have added a comment hoping someone will find more context useful. Also created #11875.