pre-commit / pre-commit-hooks

Some out-of-the-box hooks for pre-commit
MIT License
5.2k stars 694 forks source link

Performance issues & overhaul proposal #1069

Closed knyazer closed 2 months ago

knyazer commented 2 months ago

Hey,

I love the idea of having the "default"/"recommended" pre-commit-hooks, but.... I don't like the fact that they are slow! In my case, I included almost all of them into my pre-commit-config, and on a repository with an empty main.py (so that the hooks get applied), I get the execution time per-hook of about 50ms! This leads to ~600ms for all the hooks I have included, and seemingly the reason for this is Python overhead (and only a little bit pre-commit overhead), though I am not sure what exactly makes this thing worse. The reason why I concluded this, is because ruff hook (linter + prettier for python) takes about 5 times less time than "end-of-file-fixer". Wow.

I am an adept of very-fast-checking of things, and having sluggish hooks that do pretty simple things is not what I would love to see from such a wonderful project. Since I understand that rewriting all of these hooks is a pretty large task, I wonder what is your opinion about me just making an external rust/whatever based checker for these hooks (and maybe later replacing normal pre-commit-hooks with it)?

There is another reason to make an overhaul of the pre-commit-hooks: bundling of different checks together. While right now there are around 20 checks available, adding all of them is cumbersome: I need to painstakingly go through all the checks, choose the ones that I like, copy the name into my .pre-commit-config, and then go on. Would be a lot nicer to have "bundles" of checks, with ability to opt-in/opt-out, and while it makes sense to build this on the level of separate hooks, I feel like it is a loooot easier to just make an executable that would do all the checks. I am seeing it as something like this (in addition to supporting the usual hooks):

repo: blah
hooks:
    - id: super-hook
    - args: [--checks=all, --ignore=python-debug-statements]

This might look like a bad idea, since it opts-in into all the new stuff we might add, but since the revision is pinned it won't, so its alright

Since having a single executable allows to not reread all the files all over again to do a single simple regex pass over them. Another option would be to have a caching system, e.g. dump all the files into persistent blob in RAM, and share it with the future hooks that know how to use it, but this seems to break the pre-commit guarantees, so I don't like it.

Sorry for the long-read, which is not even entirely related to the "issues", but there are no "discussions", so I cannot make this a discussion :(

asottile commented 2 months ago

50ms... slow?

sorry but no