Why doesn't mypy primer use a cache

hauntsaninja / mypy_primer

Run mypy and pyright over millions of lines of code

MIT License

55 stars 29 forks source link

Why doesn't mypy primer use a cache #29

Closed DanielNoord closed 2 years ago

DanielNoord commented 2 years ago

Hi,

Not sure if this is the best place to ask this question but we have implemented a mypy primer inspired primer for pylint and I have also implemented one for my own project (https://github.com/DanielNoord/pydocstringformatter). Many thanks for the inspiration and this awesome tool. It helped us prevent a number of painful regressions for upcoming pylint releases.

We are now looking to expand the functionality of the tool (https://github.com/PyCQA/pylint/issues/5364 if you are interested) and a question I previously had came up again. Is there a reason why the primer doesn't use some sort of cache for the result of the main branch? (Or am I missing something in the code?). It seems like the primer runs twice over the merge-base commit and the commit of the PR. Can't the result of merge-base be stored after a run of the CI on main and then re-used in all PRs?

I didn't see this as something on the todo list and for us it seemed like low hanging fruit to improve the runtime of the CI so I (perhaps incorrectly) assumed it was considered and then rejected. Am I correct in assuming so? And if yes, could you elaborate a bit on the rationale? If not, please consider this an issue about adding that to the todo list 😄

hauntsaninja commented 2 years ago

Hello!

Awesome to hear that! :-) Glad to have played a small part in helping make Python tooling great.

Your understanding is correct: every time we run mypy_primer, we run things against the merge base and the commit to compare to. There's no great reason why we don't cache things across runs. While mypy_primer isn't fast, with sharding it's fast enough, so typeshed and mypy haven't really felt the pain (in particular, for mypy, unit tests are slower than sharded primer!)

The nice thing about always recomputing results is that you don't have to worry about your caching logic doing something incorrect. I've found Github Actions a bit painful for testing this kind of thing, which doesn't increase my motivation to do this. That said, something I've wanted to do is cache project clones and venvs. This would be a win that doesn't cost sanity and mypy_primer already somewhat supports this locally, just not across CI runs (would need something like https://github.com/actions/cache ).

DanielNoord commented 2 years ago

There's no great reason why we don't cache things across runs. While mypy_primer isn't fast, with sharding it's fast enough, so typeshed and mypy haven't really felt the pain (in particular, for mypy, unit tests are slower than sharded primer!)

Ah okay. That seems logical. pylint is quite slow so we do really need to fix this somehow. I'll have a look at whether sharding can help us as well.

The nice thing about always recomputing results is that you don't have to worry about your caching logic doing something incorrect. I've found Github Actions a bit painful for testing this kind of thing, which doesn't increase my motivation to do this.

This is a worry I have myself 😄

That said, something I've wanted to do is cache project clones and venvs. This would be a win that doesn't cost sanity and mypy_primer already somewhat supports this locally, just not across CI runs (would need something like https://github.com/actions/cache ).

Yeah, we have been looking at that as well. I guess for pylint it is the other way around though. Cloning the projects is insignificant compared to the time it takes to lint some of them.

Thanks for your answers though! Much appreciated! Closing this 😄

DanielNoord commented 2 years ago

@hauntsaninja We actually managed to implement something in pylint that mimics the mypy primer. Once again want to thank you for the inspiration you provided. Let's hope we can avoid some future bugs with it.

This did prompt one further question though. I hope you don't mind me asking it here instead of opening a new issue. Our first normal run was on https://github.com/PyCQA/pylint/pull/6652 As you can see there are two messages by GitHub-actions very shortly after each other. However, both aren't shown as the similar comment is not being shown, while the original comment is minimised by the CI step.

We use kanga333/comment-hider like you do: https://github.com/PyCQA/pylint/blob/fadf6eaa8fb7673009bf9595be37319824ce9f1b/.github/workflows/primer_comment.yaml#L133-L140

Is this something you are running into as well? And if so, have you considered any solutions to do this? Before I spent a couple of hours diving into the Github documentation to see if there is any way to avoid this 😅

hauntsaninja commented 2 years ago

Oh you mean getting hidden by a Github fold? This happens every now and then, but hasn't been a problem for us in practice. I think this happens on PRs with a lot of activity and on those there are enough eyes and enough comment notifications that they end up getting seen.

The only related adjustment we made was on typeshed. typeshed uses pre-commit.ci which makes quick follow up reformatting commits; we disabled mypy-primer from running on pre-commit commits to avoid noise (since pre-commit.ci won't change semantics).

DanielNoord commented 2 years ago

Oh you mean getting hidden by a Github fold? This happens every now and then, but hasn't been a problem for us in practice. I think this happens on PRs with a lot of activity and on those there are enough eyes and enough comment notifications that they end up getting seen.

Yeah! I guess normally there will be a review comment in between primer runs, so the fold won't happy anyway. It's just when you push to your Github branch and then immediately re-push triggering another run. ... I'm going to add workflow concurrency now that I think about it. 😄

The only related adjustment we made was on typeshed. typeshed uses pre-commit.ci which makes quick follow up reformatting commits; we disabled mypy-primer from running on pre-commit commits to avoid noise (since pre-commit.ci won't change semantics).

That's a good tip actually! Didn't think about that, but we should probably do the same. I'll have a look at how you set this up over at typeshed! Thanks once again!