hauntsaninja / mypy_primer

Run mypy and pyright over millions of lines of code
MIT License
55 stars 29 forks source link

Significant recent performance regression? #62

Closed AlexWaygood closed 1 year ago

AlexWaygood commented 1 year ago

mypy_primer jobs seem to be taking 30+ minutes (or more) on typeshed PRs today. The norm previously was around ~12-13 minutes.

https://github.com/python/typeshed/actions/workflows/mypy_primer.yml

AlexWaygood commented 1 year ago

Also looks like some changes might now be being reported twice in the diff, for some reason: https://github.com/python/typeshed/pull/9308#issuecomment-1336169558

JelleZijlstra commented 1 year ago

I think 592a1f7972989e967da17bbbfb851e27aea9202a may have made things worse. Shard 0 is the slowest by far and includes a lot of projects (https://github.com/python/typeshed/actions/runs/3608748067/jobs/6081526467). Shard 0 gets the slowest project (pandas), with cost 120, as well as a lot of small projects with the default cost of 3. Apparently that works out to an unbalanced distribution.

I do think the approach in the commit is correct if the input data is correct and precise, but unfortunately it doesn't seem to be. Perhaps we could set something up where mypy-primer runs record their performance somewhere, and later runs use that data to figure out the sharding strategy. That seems quite tricky though. Failing that, it may be better to return to random sharding.

JelleZijlstra commented 1 year ago

And not sure what's up with the comment being posted twice.

JelleZijlstra commented 1 year ago

Oh it's much simpler: shard 0 includes all projects. That's why comtypes is in both shard 0 and shard 1: https://github.com/python/typeshed/actions/runs/3608748067/jobs/6081526467, https://github.com/python/typeshed/actions/runs/3608748067/jobs/6081526518.

63 should fix it.