Conflict between precommit and poetry

janoc commented 1 year ago

Description

If both precommit and poetry are installed then pet-executable-find will attempt to use precommit to search for binaries and this is most likely not what is wanted. precommit is used to set up git commit hooks and it is very common to use it with a poetry or pipenv project.

The above mentioned behavior will result in a binary (e.g. black) not being found by pet-executable-find because it is in the virtual environment created by poetry and not precommit - and this despite pet-virtualenv-root detecting the virtual environment correctly.

Reproduction steps

Expected behavior pet-executable-find should not prioritize precommit over the more common package managers like poetry or pipenv.

PET version 1.0.3 (current master)

Emacs version 29.0.50 (self built)

Desktop (please complete the following information):

OS: Linux Mageia 8, Windows 10

System tools versions

dasel v1.26.0

wyuenho commented 1 year ago

Which black does CI use?

janoc commented 1 year ago

Sorry I don't follow. What do you mean?

wyuenho commented 1 year ago

Which black binary does your CI pipeline use to lint?

janoc commented 1 year ago

I don't see why/how is that relevant - the problem above happens on my desktop, with no CI being involved at all. We use precommit for git hooks to enforce coding standards.

For the record, I mean this pre-commit package: https://pre-commit.com/

wyuenho commented 1 year ago

We use precommit for git hooks to enforce coding standards.

So... precommit should be preferred?

janoc commented 1 year ago

No - exactly the opposite! Pre-commit is a tool to define git hooks (pre-commit, pre-push, etc.) , not something one uses for normal development. I was not even aware that it can create a virtual environment at all.

wyuenho commented 1 year ago

But the git hooks are used to enforce coding standards, therefore ultimately, whatever code you write, however your write it or lint it, has to pass the linting checks you've set up using pre-commit. So why should you prefer the black executable binary in poetry rather than precommit? Say your precommit config is using black 1.0, but your .pyproject.yaml is requiring black 1.1. Both are sharing the same config but black 1.1 formats your code differently. When you commit, precommit complains about your code not formatted the right way. So which version of black should win?

The reason you have this problem at all, is someone cargoculted the idea that coding standards should be enforced on your computer instead of CI, and that same person didn't realize that exactly zero editor before PET was written is able to use the binaries precommit creates to lint and format your code as you write them. Some time later, someone else discovers this problem and added all these formatters and linters into the dev dependencies in poetry so their editors can use it. It works fine for some months, until somebody decides to upgrade the version in precommit or poetry without upgrading the other side. It's always the same old story.

The real solution to your problem is, if you lint on CI already, delete precommit and fix your CI config to use the linters in the poetry virtualenv. If that solution encounters too much social resistance, delete the formatters and linters from poetry and pipenv configs. If even that proves too much for your team, you can accept fate and let PET lead you to salvation with its decision, or leave the team.

wyuenho commented 1 year ago

BTW, I suspect the real reason you are having this problem is because you haven't installed the precommit hooks. Just install them and you should be good to go. You just have to accept that PET has configured Emacs to use the precommit binaries.

janoc commented 1 year ago

So why should you prefer the black executable binary in poetry rather than precommit? Because there is no black installed by precommit?

The reason you have this problem at all, is someone cargoculted the idea that coding standards should be enforced on your computer instead of CI ...

Sorry but that's both irrelevant and nonsense. The point is that code that doesn't conform to standards or fails basic tests should not get into the repository at all. CI would identify that only when it is too late. Not to mention the speed of work - our project is still small but even then the CI run can take 5-10 minutes. Have CI fail because of code formatting errors because someone forgot to run black on it and having to wait another 5-10 minutes whether the fix is OK is preferable? In what universe?

The real solution to your problem is, if you lint on CI already, delete precommit and fix your CI config to use the linters in the poetry virtualenv.

Again, you are jumping to conclusions. Our CI doesn't use anything from precommit at all - everything is managed by poetry/pipenv, including linters. As I have said earlier - I wasn't even aware it is able to create a virtual environment at all. We are not calling tools like flake8 or black from precommit's YAML directly, we have our own custom scripts around those tools, so precommit has no clue that something like flake8 or mypy or whatever is even in use.

So deleting precommit is not a solution and would be a complete non-starter, given that I am the only emacs user in our team.

BTW, I suspect the real reason you are having this problem is because you haven't installed the precommit hooks. Just install them and you should be good to go. You just have to accept that PET has configured Emacs to use the precommit binaries.

Not true neither. The hooks are working just fine.

The only thing that doesn't work are tools like black and isort in Emacs - specifically because I have used your example from here: https://github.com/wyuenho/emacs-pet#complete-example which sets up both black and isort on save. It happens only in Emacs because pet is looking for the binaries in precommit's installation and nowhere else - and they aren't there.

I would understand if you preferred to search for linters in the precommit virtual environment - ok, the linters could be installed there. But since this is pet-executable-find that is concerned, it literally searches for every binary there. And that is bonkers, IMO, given that development dependencies and tooling are not managed by precommit and pet actually knows that there is another package/virtual env manager present (both pet-use-poetry-p and pet-use-pre-commit-p are non-nil).

wyuenho commented 1 year ago

I'm afraid I'm completely lost. Do you not run CI on your PRs? If you don't define black and flake8 hooks in your precommit config, how do you use precommit "to enforce coding standards"? If you call custom scripts to run the formatters and linters in poetry, what exactly do you have as precommit hooks that you can't replace with a Makefile?

janoc commented 1 year ago

Do you not run CI on your PRs?

We do - linting & testing, not formatting (like black) - CI should never modify the code in the repository. However, please drop this "you should run all the formatting/linting in CI" idea. It is both irrelevant to the problem and not up to you to decide.

If you don't define black and flake8 hooks in your precommit config, how do you use precommit "to enforce coding standards"?

But we do define them. However, we are not using flake8 or black there directly (which would make precommit install its own copy of these) but we have our own wrappers around these tools - mostly because we want to be able to run them also interactively from shell, not only from the hooks, while ensuring the same options/configuration is used. There are also other, application- and business-specific things in those wrapper scripts.

If you call custom scripts to run the formatters and linters in poetry, what exactly do you have as precommit hooks that you can't replace with a Makefile?

What is so difficult to understand on the fact that pre-commit is a way to manage git hooks - and nothing more? All we want to do with it is to make it automatic to run some mandatory checks before code is committed or pushed into the repo. Such as that the unit tests pass or that there are not linter warnings. I don't see how a Makefile would solve the automatically part.

Of course, we could hack this all directly into .git/hooks/pre-commit shell script but as we are working on multiple platforms, pre-commit + Python is a more practical and portable solution - that's what it finally has been designed for.

My entire point is that the current design of pet-executable-find is wrong in my book because it will search for python-related executables (of any kind) in the pre-commit installation the moment it detects it - and will completely ignore everything else. Even though it is aware that e.g. poetry virtualenv is present. It will not even try to look at the system PATH in such case.

Which makes no sense given that pre-commit is only a rather obscure tool for git automation and pretty much nobody is going to set up their project to rely only on it and not use e.g. poetry, pipenv or whatever else for building and dependency management.

I may want to configure other things than linters in Emacs - e.g. something like the py-doq docstring generator (specifically what I actually use). Or maybe Jupiter. Or any other tools that pet doesn't support out of the box (which is OK!) and are relevant for editor use but will never make sense to have installed in pre-commit hooks.

It is specifically the use case you have documented in the README (https://github.com/wyuenho/emacs-pet#advanced-usage). And it is broken now the moment you install pre-commit because it takes priority over everything else.

wyuenho commented 1 year ago

Sorry I keep bringing up black and confuse the conversation, it was late last night lol. What I really mean is, linting for formats, typically by running flake8 checking for conformance to some variations of the black format. Linting for formats and linting for bugs are typically done on CI regardless of what you have done on your local machine.

The reason I've special-cased precommit is because I've seen plenty of people set up their linters in both precommit and poetry, and some time later the linter versions get out of sync, and people start bickering about what to do to no end because they can't configure their editors to pick up the precommit binaries. This is one of the reasons I wrote PET to make Emacs the first editor to have proper support for precommit. One of the goals of this project is to make sure that people are always writing code using the same linters as they run on CI. However, it appears that your team knows better and have configured precommit to use the binaries in poetry, so in this case PET is perhaps too opinionated for your use case, and somewhat violated this principle.

From CI pipeline design stand point, testing and linting should always be done before integration (i.e. merge into some release branch), typically by configuring your CI pipeline to test and lint on push by executing the same Makefile you use to lint and test locally, that's the automatic part, no git hooks necessary and no need to deal with OS differences. A well designed CI pipeline should be designed in a way that the same tools can easily be picked up by the pipeline and your editor, so you can lint and run tests as you develop, which also means that for the vast majority of cases, by the time you push, you would have already passed all the formatting and testing checks.

The problem is, in recent years we have seen tools like husky and precommit turning this model upside down for reasons that are still incomprehensible to me. Precommit is particularly egregious in that it by default installs its own copy of the binaries. The variations I've seen people using precommit is to either install the same linters in both poetry and precommit or to use repository local hooks, with the former far more common. This is the first time I've seen a case like yours.

Regardless, pet-executable-find currently does not work for repo local hooks and your case, which is similar. I'm not sure how best to support this esoteric use case yet, but you can do something like (setq-local flycheck-python-flake8-executable (concat (pet-virtualenv-root) "/bin/flake8") for now.

wyuenho commented 1 year ago

OK I have some ideas how to do this but I'd like to see your .pre-commit-config.yaml, .pyproject.yaml and the relevant scripts if possible. I don't want to introduce code that can't be generalized to a common use case if that makes sense.

janoc commented 1 year ago

Linting for formats and linting for bugs are typically done on CI regardless of what you have done on your local machine.

Yes, sure. That's what we do. But one usually wants to run it locally too because catching a problem locally is much faster than finding it only when the CI build fails - especially for large application and shared CI system (e.g. Azure Devops, Gitlab, etc.) where it could take a non-trivial amount of time before your build is scheduled and actually runs. So one isn't replacement for the other.

This is one of the reasons I wrote PET to make Emacs the first editor to have proper support for precommit. One of the goals of this project is to make sure that people are always writing code using the same linters as they run on CI. However, it appears that your team knows better and have configured precommit to use the binaries in poetry, so in this case PET is perhaps too opinionated for your use case, and somewhat violated this principle.

Frankly, I have never seen anyone use the tools installed by precommit for this purpose before - neither for local development and even less for CI where the tooling often comes through pre-made VM images, Docker and what not these days instead of being installed using pip.

It doesn't surprise me that people install their tools using poetry and such because the fact that precommit actually installs its own copy isn't well documented and it is a rather recent addition. Originally the python language setting didn't do that, you had to explicitly ask for it using the python-venv language. This got deprecated and now the virtualenv is the default.

It is a strange design decision from precommit, IMO - why one would want to use a separate tool to install and manage linters than what the rest of the project uses (and thus causing version mismatch issues and such) is quite beyond me. Certainly not something I would expect from a tool for managing git hooks. Sounds like a well intentioned bad good idea and feature creep to me.

From CI pipeline design stand point, testing and linting should always be done before integration (i.e. merge into some release branch), typically by configuring your CI pipeline to test and lint on push by executing the same Makefile you use to lint and test locally, that's the automatic part, no git hooks necessary and no need to deal with OS differences.

You misunderstood me. Yes, that's what our CI is doing, that's not the issue.

The point about the automatism is, again, what I wrote above - you want as quick development cycle as you can get. If you have to wait X minutes for the CI build to run after a push only to discover that it failed because of a bloody linter warning and now you have to fix both the problem and also the git history on your branch so that one doesn't merge 10 stupid commits containing only formatting fixes to the master, it is extremely frustrating.

On another project we have about 120 minutes build times. On a local machine! (yes, it is a huge C++ application). In CI it can take several hours to run through, depending on the load on the CI system. Can you imagine working only with that? Managing maybe two builds a day?

That is why the checks are available also locally. And by automatism is meant that the developer doesn't need to remember to run them - the commit hook does it for him/her, saving them time by preventing CI failures for completely avoidable reasons. This is a very common setup and it isn't meant to replace a well functioning CI - but to be a complement to it.

CI still runs these same checks too - and more. E.g. we don't usually run end to end test on local machines, only in CI. There are also some additional checks run in the CI like compliance stuff and security vulnerability scans. CI also runs the checks both on the developer's branch and on the pre-merged branch (local developer checks run only on the branch being committed/pushed), etc.

The variations I've seen people using precommit is to either install the same linters in both poetry and precommit or to use repository local hooks, with the former far more common. This is the first time I've seen a case like yours.

I guess it is because of how our project evolved. We haven't started with precommit on our project but we did have linters, formatters and test runners set up. They were included as pipenv/poetry scripts - originally as shell script but we have switched most of that stuff to Python in order to not have to maintain a different script for Windows and Linux. The scripts also group the related tools so that we don't have 20 different things to run each time (flake8 + mypy, isort + black, etc).

So the developer can call e.g. poetry run test to run tests, poetry run format to run black and isort, poetry run lint to run all linters, etc. This runs in the virtual environment created by poetry (or pipenv if it is a pipenv project) and using the binaries from there.

These scripts are then also called by the CI pipelines and git hooks, so we have the same setup for everything and don't need to maintain multiple scripts/configurations. We have originally used shell scripts but later we switched to precommit because it is less messy with multiple things running in the same git hook (e.g. checking line endings + running linters).

IDE configs (VSCode, etc.) are a separate thing but those usually just need to refer to the configuration files for flake8 and mypy in the project and most of the stuff just works with the tooling in the virtual environment.

Regardless, pet-executable-find currently does not work for repo local hooks and your case, which is similar. I'm not sure how best to support this esoteric use case yet, but you can do something like (setq-local flycheck-python-flake8-executable (concat (pet-virtualenv-root) "/bin/flake8") for now.

I have worked it around by redefining pet-use-pre-commit-p to always return nil in my setup. Not ideal but I can't imagine a situation when someone would want to prefer e.g. mypy from precommit instead of the one from the project's virtual environment. That's why I have opened this issue.

If this (people actually preferring the tooling inside of precommit virtualenv instead of the one installed explicitly by the project requirements/package manager) is a real use case that people want to use, then that would be different matter, of course.

IMO, there is no need to overthink this. If you want to keep the priority on precommit as now then even having pet-executable-find simply try all options, not just the first one would work. I.e. if it fails to find the binary with precommit, it will still try poetry/pipenv/whatever else it detected (corresponding pet-use-foo-p is not nil) and exec-path as a last resort. That would completely avoid this problem.

Yes, in some cases it could find a "wrong" binary that e.g. isn't part of the virtual environment but installed system-wide with a different python version and what not - but in 90% of cases it doesn't matter for things like linters - and you don't need to support every possible broken config in the world.

OK I have some ideas how to do this but I'd like to see your .pre-commit-config.yaml, .pyproject.yaml and the relevant scripts if possible. I don't want to introduce code that can't be generalized to a common use case if that makes sense.

I don't think I can share those but they are really project-specific anyway. I have outlined the general idea how it works above. pet certainly shouldn't go "snooping" into those files while searching for binaries.

wyuenho commented 1 year ago

Frankly, I have never seen anyone use the tools installed by precommit for this purpose before - neither for local development and even less for CI where the tooling often comes through pre-made VM images, Docker and what not these days instead of being installed using pip.

This is an even worst solution to the problem of tooling versions getting out of sync. Instead of having a copy from pre-commit and poetry, you now have another version in the docker image that will potentially produce different results due to version drift. I'm fairly certain that people either use the pre-commit binaries or poetry's far more often than baking them into the images, people just cache the third party package installations if doing a clean install on every push is too slow.

A small sampling of projects on github shows that an overwhelming vast majority of them install their devtools both in pre-commit and whatever virtualenv tools they use, I suspect is because people do realize of the version mismatch issue and that it was next to impossible to configure VS code to pick up the binary installed by pre-commit. Repo local hooks alleviates this problem somewhat, but it's not as easy to configure the tools correctly as the default hooks mechanism.

The point about the automatism is, again, what I wrote above - you want as quick development cycle as you can get ... IDE configs (VSCode, etc.) are a separate thing but those usually just need to refer to the configuration files for flake8 and mypy in the project and most of the stuff just works with the tooling in the virtual environment.

I consider all of these problems the result of a combination of picking the wrong tool for the job, badly configured editors and bad workflows, and just plain silly human psychology (i.e. tendency to overestimate the impact of small problems - bikeshedding).

First of all, things like line endings, there's editorconfig, it works on every editor and cleans up the code as you write them. Secondly, a properly configured editor should be linting and possibly continuously running the tests as you write code. If this is the case, the chances of having lingering linting and formatting problems when you commit is infinitesimal, and even if you use pre-commit, people can still bypass the checks or not installing the git hooks in the first place. Thirdly, a properly configured CI pipeline should be running linting, testing and building as separate steps, usually in parallel. Linting is usually the fastest step that takes less than a minute. Also, per the second reason above, having linting errors on CI slowing down devs is usually the result of a misconfigured editor. In my experience, this usually only happens when a new dev joins the team, and as soon as he fixes his editor, it's never to be seen again. Having a pre-commit guard running on every dev's repo before push just slows down every other dev on the team for fear of something that rarely happens, and will be caught on CI anyway. If you amortize the cost of error detection, I'm not sure catching them on CI is any worst than spreading the cost evenly early among every dev. Besides, there are PR branches and CI runs on them, he who makes the mistakes should suffer the whole costs.

The only advantage to pre-commit or similar tools is, it by default only lints the files you've changed instead of the whole repo, but that's just a couple of lines of shell script on CI, so my default response to anybody using pre-commit or similar tools is, you probably don't need it.

Anyway, good chat! And thanks for the report!

wyuenho / emacs-pet

Conflict between precommit and poetry #8