python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.7k stars 2.27k forks source link

Make the lock file more merge-friendly #496

Open seansfkelley opened 6 years ago

seansfkelley commented 6 years ago

Issue

When two devs install dependencies on separate branches, it is very easy to end up merge-conflicted, in particular, the metadata.content-hash key often changes. It is very unclear how to resolve this manually, so I often delete the lockfile (or perhaps just that key) and rebuild it and, basically, hope that it comes out the same.

It seems like in some scenarios that merge conflicts could be resolved automatically based on pyproject.toml. Yarn does this, for instance.

neersighted commented 1 year ago

A final thought: we can never come up with the 'canonical' (what the solver would pick, given all information was available to it) solution to lock file merges, but maybe we can at least start validating the lock file makes sense? That is to say, we store the dependency tree, so we could at least validate that it still makes sense/the solution doesn't have internal conflicts. We might already do that, but it certainly doesn't happen in the parts of the installer I am familiar with... cc @radoering @dimbleby

radoering commented 1 year ago

2. see poetry lock --check

This command just checks for the correct hash. (That's why it is quite fast.)

(I personally believe that Poetry's UX would benefit if install performed a lock --no-update, but I understand that the metadata cost might be prohibitive here).

I disagree. Even for large projects poetry install is quite fast if most of the packages are already installed. poetry lock --no-update takes significantly longer and it's just not necessary if you have taken precautions to make sure that the lock file checked into your repo is alright.

We might already do that, but it certainly doesn't happen in the parts of the installer I am familiar with...

More or less. Actually, each install contains a solver run for the target environment with the lockfile being the only available repository. If the lockfile is not consistent with the pyproject.toml you get some sort of a solver error many users don't understand. The only hint for the reason of these errors is the warning about an inconsistent lock file (based on the hash). Just a quick idea, but we could derive "solver error during install" -> "inconsistent lock file". (It's probably not that easy because there are probably a bunch of different exceptions we have to consider but it might work.) Anyway, it's less expensive to validate the lockfile for the current environment (during install) than to validate it in general.

neersighted commented 1 year ago

The only hint for the reason of these errors is the warning about an inconsistent lock file (based on the hash). Just a quick idea, but we could derive "solver error during install" -> "inconsistent lock file". (It's probably not that easy because there are probably a bunch of different exceptions we have to consider but it might work.) Anyway, it's less expensive to validate the lockfile for the current environment (during install) than to validate it in general.

Right, this matches my current recollection. If it's a full solver run (and not abbreviated like I thought), then it's more robust than I realized. The missing piece is inferring that a solver error during install is likely caused by an inconsistent lock file; I wonder if re-trying with an automatic re-lock makes sense in that case?

dimbleby commented 1 year ago

poetry lock --no-update is fast when the relevant part of the poetry cache is populated - but that's not usually the case in CI pipelines. Then that command has to go and retrieve metadata from pypi or other repositories (which can indeed be slow). For similar reasons poetry install is probably slow in CI pipelines.


In general it is not possible safely to merge two independent changes to a lockfile without re-solving. One can construct examples in which an upgrade to A is possible, an upgrade to B is possible, but the upgraded A and B are incompatible with one another. So if some bot has proposed both upgrades then it would be an error to merge both, even if they are compatible so far as git can tell. In that sense, being merge-friendly would actually be an anti-feature!


~That being said, I still am confused as to what problem folk are actually seeing here. eg my experience with dependabot is that it updates only the lockfile, not pyproject.toml, and therefore its updates never make changes to the hash in the lockfile. Clearly there's something going on that y'all want to talk about - but it's not what I'm seeing.~

Edit: I mostly withdraw this last comment. Looking back over some dependabot MRs: it is capable of making lockfile-only changes, but it also sometimes changes pyproject.toml - even when it didn't have to. Perhaps it would be a good idea to ask dependabot to behave as I had believed it behaved all along, ie have it not change pyproject.toml when the update that it is making is compatible with the existing requirements

radoering commented 1 year ago

I wonder if re-trying with an automatic re-lock makes sense in that case?

I wouldn't want to have an automatic re-lock. IMO, install should not change the lock file. At least not per default.

neersighted commented 1 year ago

In general it is not possible safely to merge two independent changes to a lockfile without re-solving.

This has always been my general objection. However, I wonder if users might find it more desirable to use a "best-guess" (Git merge-based) solution when possible, and to bail out when an incompatible merge happened (since we in theory have the metadata to detect that/already run the solver).

However, I just noticed a seeming non-sequitur from @Mogost:

So far, I'm getting frustrated with the amount of merger conflicts that are draining CI resources.

I don't understand what you mean -- a conflicted poetry.lock is actually going to use less resources to detect than a "merged" lock file that is not a usable solution. So in that sense, I don't think you'll get what you want if we "solve" this (which appears to be possible by choosing to compromise/break at a different point, at least to me).

dimbleby commented 1 year ago

I've been looking more carefully at my dependabot MRs to understand why sometimes dependabot updates pyproject.toml and sometimes it is content simply to update poetry.lock

I think the difference is that >= constraints don't cause it to update pyproject.toml, but ^ constraints do.

So perhaps there's some code already in dependabot that is trying to be smart about this, but which could be taught about ^. That would likely significantly reduce the amount of noise in the hash value.

Maybe someone on this thread who finds this bothersome would like to raise an issue (or merge request) at dependabot to investigate (or fix) this.

djmattyg007 commented 1 year ago

Even if you don’t want renovate to group dependency updates, you can still do things like ask it to only create one PR at a time, for example. This can reduce the burden of wasted CI time. No matter which way you look at it, it will still make your life easier in dealing with this poetry lock file problem.

zyv commented 1 year ago

However, I just noticed a seeming non-sequitur from @Mogost:

So far, I'm getting frustrated with the amount of merger conflicts that are draining CI resources.

I don't understand what you mean -- a conflicted poetry.lock is actually going to use less resources to detect than a "merged" lock file that is not a usable solution.

Well, it's a non-non-sequitur.

Right now, if you want a completely automated solution to resolve merge conflicts with Poetry lockfiles, you have to tell dependency bots to "rebase" their PRs every time a PR touching the lockfile is merged. I'm putting "rebase" in quotes, because in the context of dependency bots like dependabot or Renovate it's not about git rebase, but rather about re-locking the lockfile.

So, imagine, you have 4 dependabot PRs. You merge 1 PR, and trigger 3 PRs to re-lock and re-run the CI (not only Poetry, which might take an insignificant amount of time as compared to tests, but everything). Then you merge the 2nd one and the dance repeats.

This is what drains CI-time, not a possibly failing Poetry call, which (if it indeed fails) will actually conserve CI time, because downstream jobs just won't be able to proceed.

So I think your thinking goes into the right direction in terms of saving CI time.

neersighted commented 1 year ago

Maybe this is a matter of differing workflows, but as when I use automated tools there is one PR or they are batched, the prospect of "random" failures requiring human intervention during the install stage seems much more likely to waste CI time.

bmitc commented 8 months ago

Have Poetry developers looked at other lock file implementations to help with creating a more merge friendly format? What about taking a look at Elixir's mix.lock format? I never had any merge issues with it, but I am constantly having issues with even small Poetry projects.

mrcljx commented 8 months ago

The content-hash remains to be a big point of frustration. When three engineers add dependencies it's a constant race who gets their merge in first. The other two will have to rebase either once or twice to fix the content-hash.

simonpercivall commented 7 months ago

The issue I'm most afraid over right now over how the poetry.lock file works is updating sub-dependencies that don't touch pyproject.toml. Now, in a pretty big team of mixed-seniority devs, they don't always know how to properly merge and inspect the result, and it's happened more than once that an upgraded sub-dependency gets invisibly downgraded in a merge conflict resolution (even worse, git makes it really hard to see things that change in a merge commit); that is, the dev blindly checks out --theirs or --ours, and then runs poetry lock --no-update.

If there was even an imperfect way poetry could decrease the surface of a conflict (and just having a built-in way to resolve content-hash conflicts would solve like 95% of conflicts), it would go a long way to making it easier to use poetry in a team. (Sure, we could go back to pinning every sub-dependency in pyproject.toml, but then what's the point.)

In a tool that I otherwise really enjoy using, this is a painful pain point.

dimbleby commented 7 months ago

The issue I'm most afraid over right now over how the poetry.lock file works is updating sub-dependencies that don't touch pyproject.toml

since there is no content-hash change when pyproject.toml is unchanged, a "built-in way to resolve content-hash conflicts" would make exactly no difference in these cases

bmitc commented 7 months ago

At the moment, I am the only developer managing dependencies with Poetry, and I actually don't know how to manage poetry.lock at all. On any merge conflict, I just blast away (i.e., delete) the poetry.lock file and regenerate it with poetry install after fixing any merge conflict in pyproject.toml.

I feel Poetry is absolutely essential to use, but I don't know how to manage the poetry.lock file even in very simple cases. When a hash changes, I don't know what to do with that without a version change as well, so I am not personally aware of what decision criteria to use when looking at a merge conflict in poetry.lock. The above method of deleting and regenerating it actually seems to work okay, but it seems like it defeats one of the purposes of the lock file.

simonpercivall commented 7 months ago

The issue I'm most afraid over right now over how the poetry.lock file works is updating sub-dependencies that don't touch pyproject.toml

since there is no content-hash change when pyproject.toml is unchanged, a "built-in way to resolve content-hash conflicts" would make exactly no difference in these cases

Ah, I explained badly:

  1. In Branch A Update a sub dependency
  2. In A Update the pyproject.toml
  3. In branch B, update pyproject.toml
  4. In B, merge A, resolve the conflict badly, lock --no-update, step (1) is now gone

But as been said earlier in the discussion: Of course, to go as far as possible, any solution would need to look at both sides of the merge, and try to solve them together (like the NPM way someone described, or like the now-outdated "poetry-merge-file" package). And this is of course much harder to maintain.

simonpercivall commented 7 months ago

This has always been my general objection. However, I wonder if users might find it more desirable to use a "best-guess" (Git merge-based) solution when possible, and to bail out when an incompatible merge happened (since we in theory have the metadata to detect that/already run the solver).

@neersighted Have you thought any more about this idea? To me, it sounds ideal if possible.

(Sorry if this became spammy, but there's been so much back and forth in this issue that I felt a moments suggestion/thought may have been lost.)

king-of-poppk commented 6 months ago

This is not just a dependabot issue: if you work on a long-lived branch where you add/remove/upgrade dependencies good luck with rebasing against external dependency upgrades (for instance, merged dependabot PRs)!

Andrew-Chen-Wang commented 5 months ago

This is more for myself, but having to rebase all the time, just accept the incoming rebase:

git rebase -s recursive -X origin/main

then re-lock all the lock files. This primarily assumes the development branch's PR is JUST for adding a new package.