python / typeshed

Collection of library stubs for Python, with static types
Other
4.3k stars 1.73k forks source link

Ideas for improving typeshed testing and tooling #8774

Open AlexWaygood opened 1 year ago

AlexWaygood commented 1 year ago

A few days ago, I had a great conversation with @kkirsche offline about some ways we can improve our communications with contributors. One of the things that came up is that it would be good to have a clear outline of specific areas to do with tooling and testing where PRs would be welcomed. With that in mind, here's some areas where I think our testing and tooling could be improved, and where I would welcome contributions.

For any of these tools/ideas: if a PR would be enough work that it'd feel wasteful if it was rejected, please make sure to open an issue discussing the specific change first. If an issue gets a few thumbs-ups, it's a pretty good sign we like the idea, and you should feel free to proceed :)

(Other maintainers: please feel free to edit this!)

Existing tools

Improving existing tools is the "low-hanging fruit": it's always going to be easier to build on existing infrastructure than building something new entirely from scratch.

For each of these tools, I've tried to list a "primary maintainer": this is the person who's most likely to make decisions on a certain tool. That doesn't mean that this person is the only person who's ever going to chime in on a certain tool. For example, @hauntsaninja is the person with the most expertise on stubtest, and people are probably always going to defer to him with regards to that script. But that doesn't mean that other people won't also be interested in reviewing -- @sobolevn and I also have experience with writing stubtest improvements, and may well chime in on a stubtest PR.

Stubtest

Primary maintainer: @hauntsaninja

Stubtest compares what exists at runtime to what we've got in the stub, and checks for inconsistencies between the two. It's one of the most useful tools in our CI, and also really easy to contribute to. If you make a PR for a stubtest improvement, it'll probably be reviewed by some combination of @hauntsaninja, @JelleZijlstra, @sobolevn and/or me within a few days :)

Stubtest improvements generally fall into two categories:

What to improve:

Stubgen

Primary maintainer: ?

Stubgen is a tool for autogenerating stub files from source code. It's not a code base I'm particularly familiar with, and I'm not sure it's quite so actively maintained as some of the other tools on this list -- but here's a few ways I think it could be improved:

flake8-pyi

Primary maintainer: @AlexWaygood

flake8-pyi is a flake8 plugin to check stub files that works through analysing the Abstract Syntax Tree of the stub. It's a tool that's a lot more "dumb" than stubtest, since it doesn't have the full power of a type checker behind it, and doesn't know anything about the runtime either. There's still a lot you can do just with simple AST analysis, however, and it's another code base that's easy to contribute to if you're familiar with the basics of ASTs and the visitor pattern.

Lots of recent checks we've introduced have been fairly style-focused, but we've also added some bugbear-esque checks in the past that are aimed at catching common bugs:

We should continue to think creatively about whether there are other common bugs that we can catch using a simple AST analysis.

flake8-pyi is maintained by the typeshed team, so if you have an idea for a new check, just file an issue! If it gets a few thumbs-up, it's probably a good idea, and you should feel free to make a PR. If we're sceptical, we'll probably say so :)

There's also a bunch of open issues already with ideas for new checks. If it has a few thumbs-up (or even if it's just older than a week or two and hasn't been closed yet), then it's probably an idea that we're receptive to!

mypy_primer

Primary maintainer: @hauntsaninja

Mypy_primer checks out a copy of typeshed prior to a PR and after a PR. It uses "old typeshed" and "new typeshed" to run mypy against millions of lines of open-source code, and reports the difference between the two.

Here's some ways I think we could make mypy_primer even better:

Test cases

Primary maintainer: @AlexWaygood

Our test_cases directory (aimed at providing regression tests for places where stubs have caused problems in the past) is new, and still somewhat experimental. The biggest outstanding issue I see was pointed out by @Akuli in https://github.com/python/typeshed/pull/8066#discussion_r896969019, which is that the test cases are liable to slip out of date with the upstream source code in much the same way that stubs are. How can we ensure that that doesn't happen? Can we add some kind of verification that ensures that the places where we're asserting type checkers shouldn't emit errors do actually pass at runtime?

Stubsabot

Primary maintainer: @hauntsaninja

Stubsabot -- a bot which automatically suggests bumping pinned versions for third-party stubs -- is another new addition to our tooling. Here's some ideas of how we could make it even better:

The "stubtest failed" bot

Primary maintainer: @hauntsaninja

Note: future status of this tool is TBD. https://github.com/typeshed-internal/stub_uploader/pull/60 means that we can now pin stubs to more specific versions, which should hopefully reduce the number of daily stubtest failures we get. Maybe we'll be able to do away with the check altogether...

Stubtest is run on the full suite of third-party stubs every night, and a bot automatically opens an issue if stubtest failed on any of them (e.g. #8745). Things that would be great to see improved about this workflow are:

Miscellany

These are both pretty minor nits, but it would be great to:

Ideas for new tools

These ideas are necessarily going to be more work than the ideas above, as most of them involve writing new scripts from scratch, which would create its own maintenance burden. But here's some pie-in-the-sky ideas for brand new tests/tools we could add.

Note that new tools don't need to be part of typeshed or to run in CI to make a difference. For instance, for a long time, we didn't run stubtest in CI, but during that time we used stubtest to fix many, many errors in typeshed. Keeping a tool separate is a good way to avoid disagreement over things like the balance of false positives vs false negatives, to iterate quickly on the tool, and to prove the tool's value.

sobolevn commented 1 year ago

Mypy improvements are also welcome! Anyone can ping me to get help: suggestions / reviews / etc.

srittau commented 1 year ago

Unpinned temporarily to make space for #8956.

jenstroeger commented 1 year ago

Thank you @AlexWaygood for the list of tools, I’ll have to take a closer look at some of them. I was actually about to open an new issue, but after reading through your writeup I decided to comment on this one instead.

Idea for a new tool

My suggestion/question is related to the Allow package references as version specifiers over at Python’s discussion board.

I noticed that a types package here doesn’t necessarily have any version-enforced relationship with its code package. For example, it is perfectly fine to have this installed

protobuf                      3.20.3
types-protobuf                3.19.6

This looks problematic and a checker that warns against such version mismatch would be useful.

A proper solution, methinks, would be to create a dependency between the types-foo package and its related foo package, which is discussed here but might be better suited as an issue on this Typeshed repo?

hauntsaninja commented 1 year ago

Thanks for the suggestion! Agreed that this has been a bit murky (although the murkiness has often worked out favourably for users). To that end, we've recently done a couple things to make versioning between types-* and upstream work better: 1) the new versioning scheme in stub_uploader that should result in it being a little clearer what upstream a given version corresponds to, 2) stubsabot, which helps ensure we keep our stubs up to date.

I would be okay adding upstreams as an extra to types-* packages, but would be strongly opposed to adding a default dependency, especially one that had constraints.

jenstroeger commented 1 year ago

Thanks @hauntsaninja, glad to know that this issue is being at least thought about. Perhaps I should link this comment to the Python discussion or would you like to leave a note there? Just to make sure we cross-pollinate on this issue.

I would be okay adding upstreams as an extra to types-* packages,

Using what name? Then users would have to specify, e.g.

dependencies = [
    "protobuf ==3.20.3",
    "types-protobuf[protobuf] ==3.20.3",
]

or some such?

but would be strongly opposed to adding a default dependency, especially one that had constraints.

Mind to elaborate on the reason?

AlexWaygood commented 1 year ago

Mind to elaborate on the reason?

Wherever possible, stubs packages should be extremely minimal packages that are easy to install, have minimal dependencies, contain no executable code, contain minimal security risks, and are easy to bootstrap.

We've started to relax this policy recently with recent moves working towards allowing non-types dependencies for stubs in some situations (this work is not yet complete). But it's still a principle we should stick to wherever possible.

Avasam commented 1 year ago

Here's an idea, because it happened to me again: If stubtest doesn't know the name of a parameter (kwarg, c-module,...), and it starts with a single _ and it would not shadow a keyword/import if stripped. Then it is likely a mistake and the dev meant to use a positional-only argument.

Or I can wait for mypy to support the / syntax for positional-only arguments. Which I assume may happen with Python 3.7 EOL next year.

AlexWaygood commented 1 year ago

Or I can wait for mypy to support the / syntax for positional-only arguments. Which I assume may happen with Python 3.7 EOL next year.

Yeah, I'm tempted to say it may not be worth it to write a check for this, given that 3.7 EOL is on the horizon :) But if the implementation isn't too difficult, it might be interesting -- writing the check might even reveal a few existing bugs in typeshed!

(Note: I'm not sure exactly when we'll be dropping support for Python 3.7. We haven't really discussed it yet. We waited a few months after Python 3.6 had reached EOL status before dropping support for 3.6, due to the fact that it was still quite widely used when it reached EOL status, and we wanted to give people more time to move over.)

Akuli commented 1 year ago

There aren't that many parameter annotations that start with a single underscore. We could check the existing ones manually.

$ git grep '\(, \|(\)_[^_][A-Za-z0-9_]*:' | wc -l
212
danieleades commented 1 year ago

any appetite for looking at using Ruff for linting? I'm not actually sure what ruff's support for .pyi files is like

sobolevn commented 1 year ago

I like ruff (you can find my name among contributors), but I think that flake8 serves us quite well:

As far as I know, there are no unique checks in ruff that can help us in better typing annotations.

AlexWaygood commented 1 year ago

Ruff seems cool in many ways, but we definitely can't switch to ruff at typeshed.

Flake8-pyi is by far the most important flake8 plugin for typeshed, and it's an important part of our CI. Ruff has only implemented a few of the rules from flake8-pyi, and the rules it has implemented from flake8-pyi have several not-so-great behaviour differences from the way flake8-pyi has implemented them.

Also, flake8-pyi is currently maintained by the same team that maintains typeshed, which makes it very easy for us to make tweaks to flake8-pyi as and when typeshed requires it. Ruff seems very open to PRs, which is great, but there's also the fact that I don't know rust, so I wouldn't be able to make PRs fixing bugs in their logic for the flake8-pyi rules they've implemented.

srittau commented 1 year ago

Since the discussion has mostly died down, I've unpinned this.