python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
17.88k stars 2.74k forks source link

Improving support for Annotated[Type, ...] annotations #12619

Open notallwrong opened 2 years ago

notallwrong commented 2 years ago

When writing plugins for code which uses Annotated[Type, an_obj, ..., another_obj] types dynamically, it would be nice to be able to access these annotations in order to either (a) perform type-checking on them, and/or (b) use those dynamic annotations to correctly infer the existence or types of other objects. I'm a total newbie to the mypy internals, but I think this is not currently possible. (See also issue #10872, and PRs #9625, #10777.) It would be fantastic if mypy could support the proper treatment of later parameters of Annotated somehow.

If I understand the current situation correctly, by the time the plugin gets a hold of the type annotation of something which was Annotated, even the unanalyzed_type, mypy will have run the TypeConverter on all of the parameters of the Annotated[...] annotation. This unfortunately erases most of the information in them, rendering them useless to the plugin. Note that PEP 593 states

Annotated is parameterized with a type and an arbitrary list of Python values that represent the annotations.

so it is in principle wrong to treat all of the parameters of Annotated[...] as types anyway.

However, I presume the parser cannot easily tell whether a given Subscript node in the AST is an Annotated[...] field in the relevant part of fastparse.py where this type conversion happens. That is, it cannot really distinguish between dict[str, "Something"] and Annotated[str, "Something"] in order to treat these cases differently (i.e. the first refers to a type Something which happens to be in a string, and the second simply contains a string "Something").

I'm not familiar enough with mypy internals to know what the best solution is. The fundamental tension that mypy wants to do type conversion directly on the AST, but it's not until semantic analysis starts that it knows which nodes are Annotated. I can see that during semantic analysis, expr_to_unanalyzed_type is used. Given that this is possible, it seems to me like the most natural thing is if the AST parsing stage does not do type conversion on Subscript parameters, but instead converts them to Expressions which are later converted to types if appropriate. Presumably this comes at some performance cost, though, in the more common cases of parameters being types (dict[str, int] or list[str] etc.).

Any thoughts would be fantastic.

zzzeek commented 2 years ago

not totally related but I would at least like typing tools to indicate the presence of an annotated type when using functions like reveal_type().

from typing import Annotated

MyInt = Annotated[int, "myint"]

x: int = 5
y: MyInt = 10

reveal_type(x)
reveal_type(y)
$ mypy test3.py 
test3.py:10: note: Revealed type is "builtins.int"
test3.py:11: note: Revealed type is "builtins.int"
Success: no issues found in 1 source file
erictraut commented 2 years ago

@zzeek, what do you mean by "the presence of an annotated type"? Are you talking about the declared type of a symbol?

Keep in mind that reveal_type reveals the evaluated type of an expression. Expressions don't have declared types. Symbols may or may not have declared types. In the specific case where an expression consists of a simple identifier that refers to a symbol (bound to the local scope) or a member access form that resolves to a symbol (in another scope), I suppose it would be possible to reveal the declared type of that symbol, if such a declaration exists. But it would be a bit odd to special-case those particular expression forms.

zzzeek commented 2 years ago

@Zzeek, what do you mean by "the presence of an annotated type"? Are you talking about the declared type of a symbol?

Keep in mind that reveal_type reveals the evaluated type of an expression. Expressions don't have declared types. Symbols may or may not have declared types. In the specific case where an expression consists of a simple identifier that refers to a symbol (bound to the local scope) or a member access form that resolves to a symbol (in another scope), I suppose it would be possible to reveal the declared type of that symbol, if such a declaration exists. But it would be a bit odd to special-case those particular expression forms.

I would like a function that returns a particular type, such as:

    def my_function() -> MyType:
        ...

to use that return type when revealed:

    x = my_function()
    reveal_type(x)   # would show MyType

The above behavior is obviously the usual case, if MyType is a type defined such as class MyType:. However, if MyType is not a class, but instead was declared using Annotated, then this detail is lost completely; the Annotated aspect seems to be thrown away up front by the type checking tools and it's as though it was never there.

the docs for Annotated state that this is basically the correct default behavior, which is fine:

This metadata can be used for either static analysis or at runtime. If a library (or tool) encounters a typehint Annotated[T, x] and has no special logic for metadata x, it should ignore it and simply treat the type as T.

However I am proposing that there be "special logic" that affects how the typing tools display the type in their UX, such that it can be differentiated from the non-Annotated version of the type.

erictraut commented 2 years ago

Ah, I misunderstood what you meant by "annotated type". I now understand. You're talking specifically about the use of typing.Annotated. Thanks for the clarification. Please ignore my previous response then.

The challenge with typing.Annotated is that all of the arguments beyond the first one are meant to be ignored by a static type checker. They are not limited to type expressions and can be arbitrary value expressions. Type checkers are not designed to evaluate value expressions.

Another challenge is that types can be combined (in unions) and transformed (narrowed, specialized, etc.). There are well-defined rules in type theory for performing these transforms. It's not clear how additional metadata specified by typing.Annotated should be combined or transformed because it is not standardized.

zzzeek commented 2 years ago

right, in my case, just showing the name of the type, rather than the type it's representing, would be enough. but i understand that's a bit arbitrary.

orenbenkiki commented 2 years ago

Perhaps asking for the same thing - what I'd like to see is a plug-in interface explicitly targeting typing.Annotated, specifically the ability to say "here is some type of an annotation field, and here is the code implementing it". For example, implementing something like Len or MaxLen for collections. Right not it isn't even clear to me this is possible using the current mypy plug-in mechanism - reading this issue, it seems it is not possible?

Searching for how to deal with typing.Annotated is very difficult since the same word "annotated" is used both for type annotations in general and also for the specific use of typing.Annotated (as witnessed by the above thread :-), so if there is some documentation on how to extend mypy to implement typing.Annotation fields, and I missed it; my apologies, and a link would be appreciated.

ethanhs commented 2 years ago

I think part of the problem with defining a plugin interface for this is that Annotated[T, CustomT] could really do anything (make checking a particular type more strict, make it less strict, do something completely different than what we do by default for a custom type etc). And we can't really insert plugin checks everywhere.

orenbenkiki commented 2 years ago

Hmmm.... my naive understanding is that only point of type annotations was to implement a single function is_a(sub_type: TypeDescription, super_type: TypeDescription) -> bool.

In this view, if we'd like to allow for "any" plug-in, then "all" we need to do is to add a list of plugin_for_is_a(sub_type: TypeDescription, super_type: TypeDescription) -> Optional[bool]. The plugin functions would be free to look in the TypeDescription to see typing.Annotation or whatever, and could return None in case the have no opinion (do not apply to this specific type combination).

The implementation of is_a would then be to invoke the plug ins in a known order, and if they all returned None, return the result of the basic_is_a algorithm.

Assuming that the TypeDescription provided a way to look at the typing.Annotation fields, it could then implement any arbitrary algorithm using them. The plug ins would need to be able to call the overall is_a and/or the default_is_a functions if needed, construct TypeDescription objects, etc.

That said, I have no idea on how mypy internals work, so the above might not make sense.

PIG208 commented 9 months ago

We encountered the same issue in #16094. Maybe a good first step would be to start type checking the metadata expressions.

alexatothermo commented 6 days ago

Also encountered this and just checked against pyright. Pyright detects typing issues in annotated value expressions as "expected" (at least expected by me).

IMO since all following arguments after the first one to typing.Annotated are arbitrary value expressions they also can / should have typing expressions and therefore should also be statically type checked. If the design of typing.Annotated with it's interfacing API functions, etc. did not think this through I do not know as I'm not that deeply versed in static typing