Open seansfkelley opened 5 years ago
+1 Resolving the content-hash
and trivial conflicts in the individual hashes section would be very welcome.
Workaround:
poetry add pathlib2
poetry remove pathlib2
or some other similar innocuous package.
Will this issue be resolved soon?
Does a missing content-hash hurt security or not?
Yarn's implementation of this is way less insane than I was expecting:
Extract the two versions of the conflicted file. https://github.com/yarnpkg/yarn/blob/a7334da31bf783af7a3efab451589fe7ac34c748/src/lockfile/parse.js#L397
Blindly try to parse the files, and if that's successful, shallowly merge them. https://github.com/yarnpkg/yarn/blob/a7334da31bf783af7a3efab451589fe7ac34c748/src/lockfile/parse.js#L399
If the merge conflict resulted in a syntax error, it fails. Yarn's lockfile structure is designed to make this easier: it's flat, it's sorted alphabetically and node is okay with duplicated dependencies in the tree.
is content-hash
mandatory though? the main problem I am encountering is that dependabot will have to ALWAYS rebase all the PRs, which could be avoided if content-hash
was not in poetry.lock. in theory shouldn't be necessary, as it can always be computed by the other hashes, right?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this is still a valid issue and should not be closed.
I have written a stand-alone script as a stopgap. It only goes half the way, resolving the merge conflict for metadata.content-hash
. The remaining conflicts, if any, should be quite trivial to merge manually. Use at your own risk.
https://github.com/cjolowicz/scripts/blob/master/python/poetry-merge-lock.py
Update:
Please use poetry-merge-lock from PyPI instead (see comment below). This should allow you to merge without manual conflict resolution, in most cases.
Here is a stand-alone tool which should handle most merge conflicts in the lock file:
pip install --user --upgrade poetry-merge-lock
This tool is in early development. If you're interested in trying it out, please let me know on its issue tracker if you encounter any problems.
@sdispater, @finswimmer : Would you be interested in a PR to add this as a Poetry command?
Hello @cjolowicz ,
your contribution is very welcome. I find this feature very useful. :+1:
However, I cannot promise if and when this can be included. Including new features is decided by @sdispater .
But please go on!
fin swimmer
as long as there is a line content-hash
we will never be able to conveniently use dependabot. When dependabot runs, it creates 1 branch and merge request per update, w/ the idea that you let your CI run, and then auto-merge.
However, because of content-hash
, with poetry, every single one of these branches has a merge conflict and must be manually dealt with, increasing the human time from a few seconds to ~10-minutes per, and, causing another run of the CI to be required.
The repository I just tried this on, 5 MR were created. If it uses tox, it takes no human time and 5 CI runs. If I use poetry it takes 10 CI runs, and a manual clone/checkout/rebase/poetry update hash somehow/push, which took me more than an hour.
Multiple this by my 30 repos, and make it a daily thing to have to deal with and it becomes intractable.
The solution cannot be in a new poetry command to resolve the lock, that would still require all the manual work. Instead, we need to be able to merge poetry.lock files that are not otherwise conflicting.
Beacuse all these MR are created at once, in parallel, we cannot teach dependabot to do it either, at the time of each MR they all come from the same spot on master so there is no conflict, its only when the first is accepted that we need to rebase the 2nd, and when it is accepted we need to rebase (twice) the 3rd, and so on.
Can we not make the issuance of content-hash
optional?
event without dependabot its troublesome.
Ideas:
so perhaps if there was a --no-output-hash option to the poetry update --lock
?
@donbowman Doesn't Dependabot rebase your PRs automatically if they conflict? This is what happens for me. I only need to resolve conflicts when rebasing or cherry-picking my own commits, in which case poetry-merge-lock mostly works fine. I sometimes follow it by poetry add insecure-package && poetry remove insecure-package
to ensure that the lock file is up-to-date. (insecure-package is just an empty dummy package that's never part of my dependency tree.)
@donbowman Doesn't Dependabot rebase your PRs automatically if they conflict? This is what happens for me. I only need to resolve conflicts when rebasing or cherry-picking my own commits, in which case poetry-merge-lock mostly works fine. I sometimes follow it by
poetry add insecure-package && poetry remove insecure-package
to ensure that the lock file is up-to-date. (insecure-package is just an empty dummy package that's never part of my dependency tree.)
this is on gitlab. it creates e.g. 5 merge requests at the same instant. as soon as i accept the first one, the other 4 are all in merge conflict. How would it magically wake up and rebase these 4? You mean the next time i run it it would see the conflic, rebase, then i would accept 1 more, then it would rebase the next 3, and so on? also, it would have the same merge conflict, how would it know how to resolve it?
I added https://github.com/python-poetry/poetry/pull/2654 to poetry as a suggested solution.
@donbowman I assumed you were referring to GitHub Dependabot. The bot recreates the PR (not a rebase, sorry for the sloppy terminology) when the changes conflict with the target branch.
i see. but it still leaves the original issue, that if i issue many update's, each as a single MR, after the first is accepted, the rest all conflict. If the dependabot script is taught to then delete and recreate, ... its very slow. if I were to run it daily, and had 10 updates, it would take 10 days to get them all in w/ this technique.
i guess, inefficiently, i could create some webhook that when the first mr is accepted, it could somehow run the dependabot again to recreate all the remaining updates, and then repeat every hour or so until all are done.
It looks like Dependabot actually does automatically rebase (well, rewrite-force-pushes) the PR by default, according to the documentation. Judging by the docs and that issue, it sounds like it does it with webhooks or another timely mechanism rather than waiting for the next run, though neither one is explicit on how quick the turnaround is.
Maybe you're seeing merge conflicts triggered by content-hash
because you've disabled this behavior?
It looks like Dependabot actually does automatically rebase (well, rewrite-force-pushes) the PR by default, according to the documentation. Judging by the docs and that issue, it sounds like it does it with webhooks or another timely mechanism rather than waiting for the next run, though neither one is explicit on how quick the turnaround is.
Maybe you're seeing merge conflicts triggered by
content-hash
because you've disabled this behavior?
content-hash conflicts even w/o dependabot. if 2 people change 2 branches, its in conflict.
the hosted dependabot of github may in fact have some faster trigger. but the dependabot-core on a private gitlab does not. its a cron.
My 2 cents, if the content-hash
format can be structured as a list of sorted main dependency names and hashes calculated from resolved sub-dependencies, then it would reduce merge conflict chance a lot. Like
[metadata]
content-hash = [
"astroid:03472c30eb2c53",
"flask:bb564576db6a918",
#...
]
Here's an approach that has worked well for me and only uses git and (recent) Poetry:
git restore --staged --worktree poetry.lock
poetry lock --no-update
When rebasing a feature branch on main, this preserves pins from the main branch, and recomputes pins for your feature branch. You would then follow up with these commands to continue the rebase:
git add poetry.lock
git rebase --continue
Most the answers here are answering what to do on a merge conflict, but we should be addressing why we're having a merge conflict in the first place. It is not reasonable to have a merge conflict whenever two developers modify different dependencies. Having a conflict means you will need to realize that, merge master into your branch, recreate the lock file (which takes at least 3 minutes, because python...), re-push, re-run your CI (which could take a lot of time as well), etc. This is very bad UX.
@lephuongbg's comment is a possible solution. Is it possible to look into this?
Thank you a lot!
This has become pretty problematic for us, to the point where it is almost a dealbreaker. It would be great to find a way to restructure poetry.lock
so that it can be more merge-friendly.
(Note: I haven't touched Poetry in a while.)
For prior art on designing merge-friendly lockfiles, Cargo's current iteration of lockfiles (eg. https://github.com/nyanpasu64/spectro2/blob/master/Cargo.lock) are flat and have neither a global content-hash nor a metadata array, but instead consisting of a large array of packages, storing metadata and checksums within each individual package's entry. The per-package checksum field might be for the same purpose as content-hash (though I'm not sure exactly what it does).
Also, cargo.lock files have changed format (https://github.com/rust-lang/cargo/pull/7070), going from a [metadata]
table (as suggested in https://github.com/python-poetry/poetry/issues/496#issuecomment-734107507) to the current flat array, with the justification that it reduced the possibility for conflicts. Looking at Cargo's changelog, the new lockfile was introduced (disabled) in 1.38 (2019), made default in 1.41 (2020), but existing projects were not updated to "new format" lockfiles until 1.47 (2020).
Note that there's an upcoming "version 3 Cargo.lock
format" mentioned in the changelog. However that's designed to handle non-master
branches in Git dependencies, rather than to improve merge conflict handling.
I also think that the solution would be to make the lock file conflict free - even if resolving conflicts will be automated, updating will either require human interaction and/or disproportional amount of CI time.
@sdispater could you provide your opinion on this issue, so that if someone takes the time to fix it they will have more information.
Since a PR (https://github.com/python-poetry/poetry/pull/2654) has been provided but it has not been reviewed, it is not clear if there is something that prevents having such an option.
I also seem to have found a duplicate of this issue https://github.com/python-poetry/poetry/issues/4189
I think this conversation needs to be bumped.
The suggestion by @lephuongbg would save teams from pesky merge conflicts dealing with lock hashes and allow dependabot to update repos far more smoothly.
I believe most of suggestions here are thought to short.
Getting somehow around the merge conflict due to the hash doesn't solve the problem. If there is a conflict, this means both sides changed the dependencies. Merging the locked dependencies doesn't necessarily results in a correct dependency tree.
The only clean solution I see right now, is the one suggested by @cjolowicz (https://github.com/python-poetry/poetry/issues/496#issuecomment-738680177)
Getting somehow around the merge conflict due to the hash doesn't solve the problem.
@finswimmer it depends on which problem you are actually trying to solve.
You are right, conflict-free lockfile format wouldn't solve the problem of always producing a correct dependency tree (in some cases, arguably <<< 50% for fresh branches not deviating too much from master, the resulting tree will be incorrect).
It will however solve a very practical problem of having most merges produce a correct tree, which is what really matters in a typical branch-based development workflow. The broken trees will be immediately found by CI upon the next push, and so the suggestion by @cjolowicz will have to be applied (poetry install
could detect this situation, output a text to this effect and suggest the commands to run to repair the lockfile).
This way the typical workflow will be "rebase / merge from master" and push, in 99% of the cases everything just works (tm) In some cases, poetry install
in CI will detect a broken lockfile and ask to fix it with a few commands.
Currently the workflow is the opposite of that - and specifically to always do the @cjolowicz trick. It doesn't make the use of poetry
impossible, just more annoying than it should be in my opinion.
For those who encounter frequent merge conflicts in Dependabot PRs:
Dependabot updates version constraints in pyproject.toml even when the new version was already covered, see https://github.com/dependabot/dependabot-core/issues/4435. This means that Dependabot PRs for direct dependencies will always conflict with each other. One workaround for this limitation is the lockfile-only versioning strategy. If you have upper bounds on your version constraints, you will need to widen the constraints manually to receive major updates.
Note that there is a growing sentiment that upper version bounds are harmful in the Python ecosystem:
Personally, replacing ^1.2.3
-style constraints with >=1.2.3
and using the lockfile-only strategy has worked well for me. I rarely need to resolve merge conflicts, and Dependabot PRs are good for identifying breaking changes, including those that are not advertised in the version number.
Issue should be renamed "auto-resolve trivial / simple merge conflicts", or a new one should be created. "all/most" is too much scope for a package manager.
When I have merge conflicts with poetry - which I've had hundreds of - it's often the content-hash, and at most, hashes and wheels.
This is extremely repetitive and tedious. In rebase conflicts, personally keep the poetry commands in the git commit, then git reset poetry.lock pyproject.toml; git checkout --theirs poetry.lock pyproject.toml
then rerun the poetry remove
/poetry add
commands. Have not tried https://github.com/python-poetry/poetry/issues/496#issuecomment-738680177 yet.
dependabot/pip/django-3.0.2 ❯ git pull --rebase origin master
From github.com:org/repo
* branch master -> FETCH_HEAD
Auto-merging poetry.lock
CONFLICT (content): Merge conflict in poetry.lock
Auto-merging pyproject.toml
error: could not apply cadc1e314... :arrow_up: Bump django from 3.0.1 to 3.0.2
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply cadc1e314... :arrow_up: Bump django from 3.0.1 to 3.0.2
project on HEAD (fae2bca) (REBASING 1/1) [=+] ❯ git diff
diff --cc poetry.lock
index cf2c1dcde,becb924fe..000000000
--- a/poetry.lock
+++ b/poetry.lock
@@@ -2897,7 -2896,7 +2897,11 @@@ testing = ["coverage (>=5.0.2)", "zope.
[metadata]
lock-version = "1.1"
python-versions = "==3.*.*,>=3.8.1"
++<<<<<<< HEAD
+content-hash = "889ccc59768f4c5a4c5dd14754e0b126d54f253171d583f0c63db16d601d6376"
++=======
+ content-hash = "9f357236d29da73cc3e57e6b48e2739bd154c1fc67cb130d7f6309a741575351"
++>>>>>>> cadc1e314 (:arrow_up: Bump django from 3.0.1 to 3.0.2)
[metadata.files]
aiohttp = [
@cjolowicz I notice that https://github.com/cjolowicz/poetry-merge-lock is archived, any more details? Maybe it'd be worth linking to potential workarounds or this PR in the README?
@tony done
@cjolowicz Thank you, this clarifies it for me! (as a workaround for the now)
As @tony mentioned, most merge conflicts that occur, are because of the content-hash in the lock file. Because of these merge conflicts, if you have several PRs open from Dependabot to update several dependencies, you'll have to spend some time waiting between merges, since Dependabot has to rebase after each merge, just to update the content-hash.
@abn could you take a look at this issue?
The main use case that seems to have come up in this thread is dependabot, and there I am a little confused. I don't see that anything has changed at the dependabot side since previous discussions: but I set up dependabot on a poetry project without any special instruction - and it already seems to update the lockfile only, when that's appropriate (eg https://github.com/dimbleby/e-treasure-hunt/pull/6).
ie, at least in that example, dependabot already is making updates that would not cause merge conflicts.
That aside, the basic ask of this issue seems to be that the content_hash
be removed from poetry.lock
.
An earlier comment said
Getting somehow around the merge conflict due to the hash doesn't solve the problem. If there is a conflict, this means both sides changed the dependencies. Merging the locked dependencies doesn't necessarily results in a correct dependency tree.
This is all true, but the content hash is providing at best a partial defence: it is only doing anything for those cases where pyproject.toml
changed. If we had a series of commits like my dependabot one above we might experience no merge conflicts but still reach an invalid solution in the lockfile.
That is, I don't think that content-hash-as-defence-against-invalid-lockfiles is wholly convincing.
I also think it's useful to see that Rust's Cargo.lock
does not include a similar hash - I wonder whether they thought about it and explicitly decided that such a hash was more trouble than it was worth, or just didn't. If anyone has a link to discussion I'd be interested to see it. "What would cargo do?" is often a pretty good way to think about such questions...
TLDR I'd like to hear what the case is for content_hash
in the lockfile.
I'm using Makefiles in my project to capture tasks and I've now got these:
# FLAGS contains CPPFLAGS and LDFLAGS vars resolved from pkg-config
POETRY ?= $(FLAGS) poetry
.PHONY: fix-poetry-conflicts
fix-poetry-conflicts: ## Attempts to fix Poetry merge/rebase conflicts by choosing theirs and locking again
git checkout --theirs poetry.lock
$(MAKE) poetry-relock
.PHONY: fix-poetry-conflicts-2
fix-poetry-conflicts-2: ## Another way to try to fix Poetry merge/rebase conflicts
git restore --staged --worktree poetry.lock
$(MAKE) poetry-relock
.PHONY: poetry-relock
poetry-relock: pyproject.toml ## Run poetry lock w/o updating deps, use after changing pyproject.toml trivially
$(POETRY) lock --no-update
Resolve the pyproject.toml
conflicts first, then run one of the above depending on the situation.
TLDR I'd like to hear what the case is for content_hash in the lockfile.
@dimbleby did you come to any conclusions?
TLDR I'd like to hear what the case is for content_hash in the lockfile.
The content_hash is the hash of relevant keys in the pyproject.toml
. See https://github.com/python-poetry/poetry/blob/59a38bd935936e87aa6deae76764cedbdbca5d43/src/poetry/packages/locker.py#L254
Poetry needs this to detect if there were made changes to the pyproject.toml
without changing the lock file and give a warning to the user, that these file might be out of sync.
FYI, for anyone using Dependabot to keep versions updated, the best strategy I've found to speed things up is setting open-pull-requests-limit: 1
on the pip config. This stops the cycle of: 5 PRs open -> One merges -> 4 PRs rebase
@samamorgan I recommend using Renovatebot, which automatically groups updates.
@samamorgan I recommend using Renovatebot, which automatically groups updates.
Thanks for the suggestion! This doesn't work for me in most projects. I want dependency updates to go through tests and if they fail, I want them to fail individually.
FWIW from the Dependabot side it's now possible to configure increase-if-necessary
as a version strategy for the python ecosystem, which can alleviate some of the pain here:
I've also noticed this problem on one of our projects.
It's not just dependabot needing to recreate PRs, it's also humans conflicting with each other and with dependabot. And indeed everytime the human needs to do
git checkout --theirs poetry.lock
poetry lock --no-update
which is rather annoying.
Poetry needs this to detect if there were made changes to the pyproject.toml without changing the lock file and give a warning to the user, that these file might be out of sync.
Perhaps instead of adding a hash, which always conflicts, poetry could add in the entire contents of pyproject.toml? This way you get the desired check, and you get the exact same amount of conflicts as in pyproject.toml. (Maybe there are even ways to improve on the idea by converting contents of pyproject.toml into some canonical form that has less chance of conflicting by adding in e.g. some extra lines - but would need to check git merge conflict detection more in detail for that)
Other thoughts:
I'm not sure this is solvable with Poetry's current data model of the lock file as a repository of links that meet the requirements of pyproject.toml. I think we might be able to evolve to that point, but @radoering and @dimbleby might better be able to describe what steps would be needed, if it's even possible.
There's also the ask for #1301, which complicates this even more (though I am beginning to think that we will just never support that feature, the lock file is not self-descriptive and trying to make it so by inlining pyproject.toml makes the problem worse, in my mind).
On top of all of this, exporting a merged lock file will result in garbage for sure -- while Poetry itself may be able to evolve to deserialize a merged lock file into something correct (and what a 'correct' merge looks like is still unclear to me), the exporter naively walks every entry, and will have to gain (or re-use) the same complexity to interpret a merged lock file.
Basically, I'm not sure that this is possible in the Python ecosystem as our metadata is extremely expensive to gather compared to every other ecosystem, and we have major compromises in the data model already (one set of metadata per package-version duo). I'd certainly like to see a PoC, but it will need extremely thorough test coverage to convince me it's not a nightmare for users and support.
Would it be possible to add a a config flag (e.g. tool config in pyproject.toml
) that instructs Poetry to not render/expect the content-hash
in the lock file for people who are running into merge conflicts a lot?
I don't think so -- simply "don't add a hash" is actually quite dangerous. Like I said, the lock file is really a cache and not fully self-descriptive. Sure, when you remove the hash, you "solve" the merge conflicts, but that doesn't mean that the solution is correct (or up-to-date for your pyproject.toml).
I think we're better off devising a method to support mergable lock files as a first class feature, or explicitly disavowing removing the hash; adding a knob to opt-out is a really bad idea as it can already be done manually, and if we add a knob people will think it is supported, and become frustrated when Poetry produces incorrect results.
So far, I'm getting frustrated with the amount of merger conflicts that are draining CI resources. This option may be appropriately named, that it is dangerous. This will solve the problem here and now and allow us to work on the hash problem as a whole.
I recommend considering Renovatebot instead of Dependabot. It can combine multiple dependency updates into a single PR so that you don’t have to worry about having multiple conflicting PRs.
However, I want to receive dependency updates separately. This clearly shows at what point the failure occurred. And dependabot is not the only reason. People are in conflict with each other when there seems to be no conflict.
I only use pip-tools in some projects because this kind of conflict does not happen there. For everything else, I would choose poetry.
@neersighted I understand the suggestion is quite dangerous, but as more and more people are reporting that the conflict issue is turning into a major pain point, I was hoping that some kind of workaround could be made possible (+1 on giving this a very clear name).
Over the weekend I wanted to understand how other ecosystems (specifically Node.js) are solving the same problem (while keeping in mind that the metadata cost in Python is different). NPM stores the "self" package in the lock file as well (Yarn only does when using workspaces it seems):
// package-lock.json
{
"name": "poetry-test",
"version": "1.0.0",
"lockfileVersion": 2,
"requires": true,
"packages": {
"": {
"name": "poetry-test",
"version": "1.0.0",
"license": "MIT",
"dependencies": {
"react": "^18.2.0"
}
},
Poetry instead stores a hash of relevant_content
which more or less contains the dependencies of the package itself:
return sha256(json.dumps(relevant_content, sort_keys=True).encode()).hexdigest()
If we compare these two approaches, NPM's lockfile can be merged by git most of the time (and otherwise humans can merge it by hand) while Poetry is guaranteed to conflict when two commits modify the pyproject.toml
.
While it seems quite possible to render relevant_content
and not the sha256
to the lock file (which actually sounds related to the ask of #1301), various comments in this thread have suggested this to be a feature to ensure/increase correctness:
Merging the locked dependencies doesn't necessarily results in a correct dependency tree.
but that doesn't mean that the solution is correct (or up-to-date for your pyproject.toml).
Generally, a git-merged lock file is not guaranteed to be "correct" regarding to the pyproject.toml
. I'd love if someone could provide the thread with a non-esoteric case where updating pyproject.toml
from two files that cleanly merges and not running poetry lock
would cause a problem. I'm fairly certain that one exists, but I failed to contrive one.
That said, the correctness argument seems to be ignoring the fact that:
npm ci
poetry lock --check
pyproject.toml
, it is also the current lock file itself. poetry update transitive-a
and poetry update transitive-b
might also contradict each other, but the hash will not cause a conflict there.The thing that makes (1) a bit unique in Poetry is that poetry install
works quite different from other package managers. As an NPM user, you make a habit of running node install
when pulling the main branch which will "heal" the lock file (and one definitely runs that after a merge conflict on the lock file). For Poetry, poetry install
will not do that, instead new users must be made aware of the lock
subcommand.
That said, this should more or less also work if Poetry stored the relevant_contents
instead of the hash for the case that someone updated a version in pyproject.toml
but then didn't run poetry lock
.
Regarding (2) to me it seems that this hash is giving a false sense of correctness. The only true way to ensure things are correct is to run poetry lock
habitually (or if possible run it on CI).
To summarize, I find it quite hard to defend the existence of the content-hash. (I personally believe that Poetry's UX would benefit if install
performed a lock --no-update
, but I understand that the metadata cost might be prohibitive here).
This option may be appropriately named, that it is dangerous. This will solve the problem here and now and allow us to work on the hash problem as a whole.
This is easy to assert from a non-maintainer standpoint, but we already pay a very high maintenance/support burden for other poorly designed/second-class features. I suspect you will not be able to find any consensus to merge another one.
Regarding (2) to me it seems that this hash is giving a false sense of correctness. The only true way to ensure things are correct is to run poetry lock habitually (or if possible run it on CI).
You're not incorrect, but this is not the intended way to use Poetry/a primary workflow. In general what is intended is poetry update
, run CI, and determine if any failures result from broken versions, at which point they are excluded from the constraints in pyproject.toml.
If we drop hashing and make the lock file self-descriptive, we'll greatly increase the maintenance burden for the lock file, but that may be a price worth paying. However, we make the common case much more risky/dangerous. People are going to blindly trust Dependabot/Git to update the lock file, and will be opening bugs against Poetry when they get unexpected results. I'm not sure that "make sure you run poetry lock --no-update
before you open a bug" will fly, since Poetry will (partially/rightfully) get the blame for doing incorrect/unexpected things.
I'd still like to hear what some of the SMEs in this area think, but since we're now talking about a "no-hash" approach instead of trying to change the solver, I'll also ping @abn @finswimmer @sdispater for opinions.
Issue
When two devs install dependencies on separate branches, it is very easy to end up merge-conflicted, in particular, the
metadata.content-hash
key often changes. It is very unclear how to resolve this manually, so I often delete the lockfile (or perhaps just that key) and rebuild it and, basically, hope that it comes out the same.It seems like in some scenarios that merge conflicts could be resolved automatically based on
pyproject.toml
. Yarn does this, for instance.