python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.52k stars 2.83k forks source link

Improve flow sensitivity, using SSA form. #16594

Open markshannon opened 11 months ago

markshannon commented 11 months ago

Feature

Improve local flow sensitivity of the analysis to reduce the number of false positives, using SSA form.

While MyPy has some flow sensitivity, it is can be tripped up by relatively simple flow control. Using a better internal representation of variables and dataflow would allow better type checking without adding complexity to the rest of MyPy.

Pitch

For example:

def foo(a: list[int], b: list[str]) -> None:
      x = 0
      for i in a:
           x += i
           print(x)
      x = ""
      for i in b:
           x += i
           print(x)

The above function is clearly safe, but MyPy complains. With SSA form, MyPy could trivially determine that the function is corrrect by treating x and i as two distinct SSA variables

def foo(a: list[int], b: list[str]) -> None:
      x_0 = 0
      for i_0 in a:
           x_0 += i_0
           print(x_0)
      x_1 = ""
      for i_1 in b:
           x_1 += i_1
           print(x_1)

MyPy already supports union types, so phi-nodes can be straightforwardly typed.

AlexWaygood commented 11 months ago

The above function is clearly safe

I think there may be a typo in your snippet :-) it looks to me like you're adding an int to a str in both loops. Maybe you had the type annotations for the a and b parameters the wrong way round?

but MyPy complains.

try using --allow-redefinition

markshannon commented 11 months ago

I think there may be a typo in your snippet :-)

Fixed

markshannon commented 11 months ago

SSA can handle more complex control flow. Even with --allow-redefinition, MyPy complains about this:

def bar(x: bool) -> int | str:
    if x:
        t = ""
    else:
        t = 1
    return t

Which type checks straightforwardly with SSA:

def bar(x: bool) -> int | str:
    if x:
        t_0 = ""
    else:
        t_1 = 1
    t_2 = ϕ(t_0, t_1)  # type(t_2) = type(t_0) | type(t_1)
    return t_2