Closed A5rocks closed 1 year ago
Yeah, agreed that this would be nice.
The distribution is quite head heavy (cough sympy, pandas, graphql cough), so I think you could get most of the benefit by just adding a manual score to the longest ones. mypy_primer --measure-project-runtimes --concurrency 1
should show project runtimes.
An amusing fact: at one point I noticed there was a particularly bad sharding, so my quick fix was: https://github.com/hauntsaninja/mypy_primer/blob/236dab370d45dccd2ac17e67180cd7d3e99248af/mypy_primer.py#L60
A random musing: I've been curious about this but haven't looked at it yet is investigating how the mypyc-compiled mypy to pure-python mypy speed differs across projects
I would love if mypy primer better balanced sharding. On a recent PR to mypy, I noticed that:
(Note that mypy-primer could take ~10 minutes less if it optimally balanced)
I know that it would be infeasible to construct lists for every single combination, so I propose:
What if every project had a "difficulty" number that was a rough estimate of time mypy takes to type check it? The idea is that you could try to balance these numbers into buckets (just use a greedy approach: from largest difficulty to smallest just always put it in the lowest difficulty bucket).
I'm not sure how we could keep these up to date though. Is there a metric that is simple to take but that correlates with mypy runtime? Number of files? Dependencies? Lines of code? Count of
import typing
?