google / pytype

A static type analyzer for Python code
https://google.github.io/pytype
Other
4.71k stars 273 forks source link

RFC: Be more explicit on unreachable code #1649

Open h-joo opened 2 months ago

h-joo commented 2 months ago

Summary

This design proposes the following two

  1. a new opt-in compiler flag to warn on unreachable code (--no-unreachable-code)
  2. a way of type-checking unreachable code in pytype, making use of the bottom type Never type such that unsatisfiable condition of a value will result in the Never type, and proceed afterwards. This feature is intended to be enabled by default.

to avoid having code chunks that are silently failing due to totally unchecked code (#2), and also having explicit failure mode (#1) on writing unreachable code on the belief that either the user is doing something unintentionally wrong in those cases, or is a pytype bug that needs an explicit fix.

Background / Motivation

Pytype checks types by a symbolic execution of the python bytecode. During the execution, it collects type information and attaches it to the CFG node. When there’s a code path of which it sees that certain code cannot be reached during any control flow, it does not execute that branch. This leaves that chunk of code totally unchecked. See the following example.

def foo(a: list[int]):
  if isinstance(a, list):
    return
                  # After the if statement nothing gets type checked
  print(a + "42") # Doesn't print an error for (+ operator) for list and int
  reveal_type(a)  # Does not reveal type for 'a'

Above is an easy example of which is trivial to understand the control flow, but even so it can be confusing because this hides things rather than reveals things, thus it becomes less obvious what’s actually happening.

pytype detected unreachable code can be one of the following.

  1. It’s indeed an unreachable code
  2. (pytype-bug) false-positive of which pytype misidentified a type compatibility (or incompatibility)
  3. (most likely a user bug) Annotated a type that cannot be fully reflected with annotation or is difficult to represent due to the limitation in the type system.

    In the first two cases, it’s bad to let this silently be ignored. In #1, the user is most likely doing something wrong (maybe intentional in some cases). In #2, it’s as if implicitly adding # pytype: disable= comment on every existing line & every existing error combinations. There were some cases like #2 that pytype had misidentified a branch to be not-taken, and was totally un-checked. This was only noticed by the user by accident, when they were clearly thinking that something in the block should have been a type error.

    For #1, it seems better that if users want to have unreachable code for some reason, making the intention explicit by adding pytype suppression comment is the way to go. For #2, it’d also be better to make it explicit and shout out the error, because users will have the chance to report this bug, rather than silently ignoring something and everything would go wrong afterwards. After all, if something is totally unchecked silently, this is totally against the reason why we want to use a static type checker.

    This design was highly motivated by a bug report for a false positive of type incompatibility (#2).

Proposal

The proposal is two fold. First, introduce a new compiler flag (--detect_unreachable_code) to enable error on unreachable code. This new flag will allow detecting unreachable code such as the following example :

def sum_all(a: list[int]):
  if isinstance(a, list):
     return sum(a)
  return math.nan # Error : Unreachable code detected

The second part of the proposal is to proceed type checking even in unreachable code by inferring the type of the value which led to unsatisfiable condition to Never, and proceed.

Also see the following example.

def return_first_or_add_42(a: list[int], b: str) {
  if isinstance(a, list):
     return a[0]
  if b:
     return b + 42 # Error: Inexistent operator + for str and int
  return a + 42    # No Error: Never + int is not an error in pytype

Currently, this code will not trigger any error in pytype, because anything after the if statement of checking whether a is a list will not be seen. But when we start to infer a as Never and proceed with type checking, the type checker will see errors like errors which aren't related to the Never type itself, like incompatibility of the operator + between str and int types.

Benefits

--no-unreachable-code

Under this flag, users will not be able to write unreachable code by accident anymore. For the three cases(see background section) where pytype can say some code is unreachable, neither of them is good to pass silently. When we have an explicit error around this code, it’ll give the chance to users either to bug-report when it’s a bug in the type checker, or add an specific reason for the suppression while knowing exactly why they were adding some code that is known to be “unreachable”. When this happens silently, none of this will happen.

infer Never in unsatisfiable conditions and proceed with type checking

When we proceed after inferring a value to a Never type, it will allow users to write code that still type checks on types that does not contribute to the unsatisfiable condition. If the unreachable code was somehow intentional, this will be useful to help write sane code even after that unsatisfiable condition. Even if it were not intentional, users will have the power to use reveal_type in those code blocks and can still understand what’s going on there.

Out of scope

This proposal does not intend to change any behavior of how the Never is type checked. The type checking would remain equivalent to having an explicit None or NoReturn type annotated on that value.

CC

I've heard that @martindemello would be interested,

also CCing @rchen152 @Solumin in case they would have an opinion.

rchen152 commented 2 months ago

This seems like a good change. Have you thought about how/where you'll attach the Never type? In the above example (an isinstance(a, list) check), it's obvious that the value referred to by variable a should have type Never, but checks can get much more complicated. They could involve multiple variables, expressions like attribute accesses, etc.

h-joo commented 2 months ago

This seems like a good change. Have you thought about how/where you'll attach the Never type? In the above example (an isinstance(a, list) check), it's obvious that the value referred to by variable a should have type Never, but checks can get much more complicated. They could involve multiple variables, expressions like attribute accesses, etc.

That's a good point. I was assuming this was somehow mechanically possible, but I guess from your response it isn't without making further changes.

I was still reading up the code to understand, but I thought that variable and bindings in pytype could represent expressions beyond a single variable. Would that be incorrect?

rchen152 commented 2 months ago

I thought that variable and bindings in pytype could represent expressions beyond a single variable.

That's correct, but figuring out which pytype variable(s) represent an expression you see in the code is not always easy. It might be useful to take a look at the implementation and tests for typing.TypeGuard to see what kinds of things we could and couldn't narrow there.

martindemello commented 2 months ago

it's also worth seeing if you can do something simpler and add a warning/error when skipping a block that nothing connects to (within run_frame in vm.py)

Solumin commented 2 months ago

I do have a very small concern about the user experience. We're only addressing a subset of unreachable code: that which is only unreachable due to typing. There is also code that the Python compiler knows is unreachable --- trivially, things like if False: or if True: return --- which pytype will never see because the compiler never generates the unreachable bytecode.

Say there's some function like:

def trivially_unreachable():
  if True:
    return
  "str" + 1

If the user knows that pytype looks for type errors in unreachable code, but doesn't know the compiler elides unreachable code, they'll be confused why pytype isn't raising a type error.
If they don't know that pytype looks at unreachable code, they'll be confused when they change the if statement and pytype suddenly starts complaining about the type error.

I do think this is a very minor concern, but it might be worth considering how this will be communicated to users.

h-joo commented 2 months ago

I do have a very small concern about the user experience. We're only addressing a subset of unreachable code: that which is only unreachable due to typing. There is also code that the Python compiler knows is unreachable --- trivially, things like if False: or if True: return --- which pytype will never see because the compiler never generates the unreachable bytecode.

Say there's some function like:

def trivially_unreachable():
  if True:
    return
  "str" + 1

If the user knows that pytype looks for type errors in unreachable code, but doesn't know the compiler elides unreachable code, they'll be confused why pytype isn't raising a type error. If they don't know that pytype looks at unreachable code, they'll be confused when they change the if statement and pytype suddenly starts complaining about the type error.

I do think this is a very minor concern, but it might be worth considering how this will be communicated to users.

I actually didn't know that. My quickest solution may be looking at the source code of which the bytecode on certain lines were not generated to detect these cases, but I don't think I like it. And I've heard from somebody (maybe from Thomas or Sergei) that pytype does also look at the AST to fix up the misalignment in the source location of bytecode and its corresponding source code sometimes. Maybe the fact that pytype does it makes this quick solution not viable?