Test stub files against real code

gaborbernat commented 6 years ago

There are a few use cases when stubs may be compelling over type annotations or type comments:

annotate code that needs to be Python 2 compatible, ideally by using type annotations (so no type comments),
often times users prefer to have type information outside of the source code (this is to avoid modifying legacy code change - and in practice lowers resistance from maintainers),
the maintainer of the library does not want to take on the in-line type annotations, someone else trying to provide it as a separate package.

However at the moment this has some limitations: stub files are used only to check clients of the stub, not the code itself.

# a.py
def add_int(a, b):
    return a + b

add_int("a", "b") # mypy does not complain, actually ignores the entire file

# a.pyi
def add_int(a:int, b:int) -> int: ...

# client.py
from a import add_int
add_int(1, 2)
add_int(1, "2") # here mypy is going to complain

In case of the above files mypy will warn about bad usage of inside client.py, but not in a.py (that is, mypy does not validate against the source files being annotated by the stub). This has serious drawbacks:

maintaining the stub files is hard, now users need to manually make sure whatever is in the stub file is an accurate representation of the matching source files (both interface - aka number of arguments, names - and type wise),
the stub creator does not know if the source code itself is type correct or not.

Here I propose to implement a way to support testing stub files against the real code:

build the syntax tree for the source file,
build the syntax tree for the stub file,
merge stub tree into the source tree
- source file only tree available: dynamically typed - noop
- abstract syntax tree only: raise error - missing annotated sources
- both stub and source AST exists:
- for all source elements for what there is a matching stub element
  - copy inject over type annotations
  - complain if function interface missmatch
now just run the existing type checking mechanism on the enriched source code ast

Merging the AST definitely is not trivial. The plan is to start with a POC implementation, that works for the most cases, and then see the viability of this approach. Note this would help a lot both stub maintainers I think (typeshed) and projects that need Python 2 support.

gvanrossum commented 6 years ago

This is not a new idea. Do you also have a plan for implementing this, or are you just requesting that the mypy core team takes this on?

gaborbernat commented 6 years ago

@gvanrossum I'm aware of this. I've chatted earlier today with some mypy core team members. Actually the plan described here is something we mostly came up together with @JukkaL. I do plan to implement this, or at least help with this (hopefully get the ball rolling during the sprints next week). That being said created the issue to discuss the general approach before anyone starts coding. I did not found a ticket dedicated existing, so created a new one.

gaborbernat commented 6 years ago

I've got an initial proof of concept version of this working (https://github.com/gaborbernat/mypy/commit/a56f007386daab9229af957ac252b3397146b8dc). Will give it a test run trying to use it to type hint the https://github.com/tox-dev/tox code base, which contains a lot more edge cases I'm sure it's not handling yet. And fix the issues coming up. Once manage to do that I'll create the PR for us to review/discuss it (probably will take a week or so).

mchlnix commented 5 years ago

Are there any updates on this? As someone using stub files exclusively, this is very unintuitive behavior (and hard to realize/easy to forget). It also makes type checking if __name__ == "__main__":" sections complicated (maybe impossible?).

pcorpet commented 5 years ago

@gaborbernat are you taking your proof of concept further? It would be extremely useful.

gaborbernat commented 5 years ago

Yeah it's a work in progress. Sadly it's quite complicated to get it right and touched a few code places that were modified in the meantime ( so needed to rebase and resolve merge conflicts like three times by now), and my availability was also fluctuating. Long story short we agreed on design and implementation. Will probably be ready for merge by middle of April. Definitely commited to get it stable by PyCon Us in early May.

gaborbernat commented 5 years ago

So just an update on this. We have the ability to merge the stub into the source code via https://github.com/ambv/retype. The retype tool will warn if the stub does not match the source file. Then on the merged source code, we can run our type checker to detect issues (see example https://github.com/ambv/retype/blob/master/tox.ini#L53-L59). Merges https://github.com/ambv/retype/tree/master/types into https://github.com/ambv/retype.

Issues with this approach are that you lose context of what content comes from the source and what content comes from the stub. mypy errors are hard to fix as errors are reported on the merged source information; mapping back to the original files can be troublesome. retype also fails at the first conflict so requires a lot of run-fix-run cycles.

The plan going ahead is to make retype also report some line mappings (allowing at this point to generate more localized errors). Also, retype should run and report all errors rather than stop at first. The later issue is of a mediocre hard. The first is hard though. The problem is that at the moment retype parses stubs via the typed_ast, the source as `lib2to3``. And performs the merge via dumping the typed AST into the lib2to3. Parsing the stub as a source is a no go though, as we lose line information needed to generate the map. In the first instance, we would need to parse the stub at least also as a lib2to3. This is a major rewrite of retype sadly. Will continue iterating on this.

graingert commented 4 years ago

imho with python 2.7 and 3.5 EOL around the corner the use-case for external stubs seems to diminish.

Already people are choosing to put their stubs separate from their code because they don't like the "aesthetic" of type annotations. In my opinion that makes this ticket describe an anti-feature, and just provides these people with ammunition to avoid applying types directly to their code.

sobolevn commented 4 years ago

Well, I use stubs with python3.7 because sometimes stubs are really ugly. And I am happy that they live outside of the source code.

Example:

mchlnix commented 4 years ago

@graingert I don't quite understand your comment. At first it sounds like, with the introductions of annotations we don't need stubs anymore (I disagree), then it sounds like you actively want to prevent people from using type stubs by making it harder, thus this would be an anti-feature.

Could you clarify?

Type stubs have fallen out of favor with me, but especially for legacy code and where I currently work there is a lot of push back for in-code annotations and a solid type stub option would surely help getting their feet wet.

graingert commented 4 years ago

It is my opinion that mypy should focus on fostering community support for inline type annotations as the default and preferred option

mchlnix commented 4 years ago

Fair enough, but I don't think treating a feature, which has non-equivalent uses as a red headed step child is a good way to achieve that. It might just split the adoption of type hints in the community in general.

Unless type stubs were always just meant as a holdover. But they seem way too useful and in some cases unavoidable for that.

ruler501 commented 4 years ago

Stub files are necessary in cases where the libraries you are using do not have annotations and the stub library for them makes a type generic. Since the subscript operator isn't defined normally you can't inline the annotations all the time. Encountered this with the django stubs.

JukkaL commented 4 years ago

@ruler501 In annotations you can string literal escaping to write generic types even when indexing doesn't work at runtime. Example: def f() -> 'LibraryType[int]': ... You can also use from __future__ import annotationsand skip the string literal escapes in Python 3.7+.

graingert commented 4 years ago

Stub files are necessary in cases where the libraries you are using do not have annotations and the stub library for them makes a type generic. Since the subscript operator isn't defined normally you can't inline the annotations all the time. Encountered this with the django stubs.

To me this seems like a problem caused by stubs rather than solved by them

ruler501 commented 4 years ago

@ruler501 In annotations you can string literal escaping to write generic types even when indexing doesn't work at runtime. Example: def f() -> 'LibraryType[int]': ... You can also use from __future__ import annotationsand skip the string literal escapes in Python 3.7+.

@JukkaL that doesn't work for inheriting though. The specific case here is roughly you have a Field class that holds a value of some type so the stubs make it explicit as Field[T], but now if you want to make a custom Field[str] I was under the impression you can't inherit from the stub version to supply the type argument to the super class so lose the ability to refer to it safely by the generic supertype without losing information.

JukkaL commented 4 years ago

@ruler501 Yeah, inheritance is tricky. You can use this ugly workaround, however:

from typing import TYPE_CHECKING

from foo import Field

if TYPE_CHECKING:
    Base = Field[str]
else:
    Base = Field

class Derived(Base):
    ...

kamahen commented 4 years ago

we would need to parse the stub at least also as a lib2to3 (re: https://github.com/python/mypy/issues/5028#issuecomment-495942769)

I'm thinking about augmenting stdlib ast module to have some lib2to3 features (because llibib2to3 is going away). If this is of interest to you: https://mail.python.org/archives/list/python-ideas@python.org/thread/X2HJ6I6XLIGRZDB27HRHIVQC3RXNZAY4/ (No promises that I'll do the work of course, but I have high hopes)

hauntsaninja commented 3 years ago

Note that mypy now ships with a tool called stubtest, which will compare stubs to what it gets from importing and inspecting the module at runtime.

astrojuanlu commented 3 years ago

Is it documented? I tried it but I get Nothing to do?! // error: xxx failed to find stubs

hauntsaninja commented 3 years ago

It's not currently documented. You'll need to a) make sure stubtest can import your module (i.e. install it into the same environment), b) make sure stubtest can find the stubs the same you would for mypy, e.g. by setting MYPYPATH

(update: stubtest is documented here https://mypy.readthedocs.io/en/latest/stubtest.html)

davidism commented 3 years ago

I just added type annotations to Werkzeug. Because inheriting from typing.Generic makes instantiation about twice as slow until Python 3.9, I had to use a stub file for the datastructures.py file instead of placing the annotations in the code. I would still like to type check the code itself, but currently Mypy only considers the stub file. stubtest only appears to compare the definitions, it doesn't solve the issue of type checking the code itself.

Also, I was tempted to use stub files everywhere, since inline annotations can get quite messy for complex code bases, but not being able to check the code itself made that not possible.

hauntsaninja commented 3 years ago

For avoiding subclassing Generics specifically, you could try using an if TYPE_CHECKING: ... block to change the base class without mypy noticing, and rely on from __future__ import annotations (or in your case, string quotes) to make indexing the class work in annotations (https://mypy.readthedocs.io/en/stable/runtime_troubles.html#using-classes-that-are-generic-in-stubs-but-not-at-runtime).

Yeah, stubtest is meant as a test for stubs, not for code :-)

ncoghlan commented 3 years ago

@graingert just pointed me at this ticket as part of adding type hints support to contextlib2. In my case, the use case is to implement CI for the bundled typehints (since I'm pulling the source code from the standard library and the type hints from typeshed and want to be sure they're still consistent with each other across all supported versions after being patched to run on 3.6+). It looks like this will do the trick quite nicely:

python3 -m mypy.stubtest contextlib2

hauntsaninja commented 3 years ago

Yup! To summarise the state of this issue:

Checking stub files based on real code: stubtest does a decent job of this (it imports the module and uses runtime introspection). It's worked pretty well for typeshed. Checking code based on stub files: no current solution, unless you get something using retype to work

graingert commented 3 years ago

@hauntsaninja out of interest why didn't it catch the incorrect type of contextlib.nullcontext in typeshed?

hauntsaninja commented 3 years ago

It did, there's just a backlog of allowlisted errors: https://github.com/python/typeshed/blob/50f5858a5628529b7a453c675e0f4b1eeccc704d/tests/stubtest_allowlists/py37.txt#L29

graingert commented 3 years ago

Ah awesome, thanks

ariba-05 commented 1 year ago

very new to typing and Python both but was just wondering if it is possible to integrate directly or as a plugin which transfers the type annotations from pyi files to source code using code transformations from something like LibCST and mypy checks for type checking in source in the next step? Of course, the solution in the description seems much better (for correctness reasons) but could this be used as a workaround in the meantime?

hauntsaninja commented 1 year ago

Yup, libcst.codemod.visitors. ApplyTypeAnnotationsVisitor is pretty useful (retype that I mentioned above is mostly dead). You can see an example of how to use it for this purpose in https://github.com/google/pytype/blob/main/pytype/tools/merge_pyi/merge_pyi.py

python / mypy

Test stub files against real code #5028