hauntsaninja / mypy_primer

Run mypy and pyright over millions of lines of code
MIT License
55 stars 29 forks source link

Reuse `mypy_primer` to check issue reproductions? #23

Closed A5rocks closed 1 week ago

A5rocks commented 2 years ago

As the title says, mypy_primer could totally detect diffs in mypy output for repros for issues on mypy -- this could help find issues that a PR closes. However, this would require having an up-to-date listing of them, which is/will probably be a challenge.

Perhaps as an MVP, just get every code block from every open issue tagged "bug" and run mypy over them...?

JelleZijlstra commented 2 years ago

@Akuli at some point ran a script that tried to repro open issues by running mypy on code samples. Not sure if it would be feasible to generalize that into something we run in mypy-primer.

A5rocks commented 2 years ago

Yeah I was originally thinking "this should be a script" but realized it would be probably be more useful in mypy-primer (no need to tell false-positive vs false-negative for an issue, and also it adds more (weirder too!) lines of code to check regressions).

hauntsaninja commented 2 years ago

This sounds useful, although people's code blocks often aren't self contained, so might need some more heuristics in practice.

I think the best way to implement this is create a separate repo in which we have a Github Action that scrapes mypy issues and commits them back to the repo in files named by issue number. We can then add this repo to mypy_primer. This would keep mypy_primer fast, relatively self contained and avoid issues arising from Github rate limits.

A5rocks commented 1 week ago

I've done this (through a local) program with https://github.com/A5rocks/mypy-issues

This doesn't belong in mypy_primer because it takes quite a while even when running on 16 cores. This isn't due to line count (I think) but moreso just mypy startup costs (I think).