peterdemin / pip-compile-multi

Python dependency locking for complex projects
MIT License
147 stars 29 forks source link

`verify` command doesn't ensure that `.txt` files continue to include all transitive dependencies #410

Open vergenzt opened 1 year ago

vergenzt commented 1 year ago

Sometimes a simple version bump in a requirements .txt file means that a new transitive dependency has been added, and pip-compile-multi --no-upgrade should be re-run to ensure all transitive dependencies are snapshotted.

Example: Apache Superset, which uses pip-compile-multi to manage requirements, bumped flask-appbuilder from 4.2.0 to 4.3.0. This new version of flask-appbuilder added a new dependency, which had its own tree of new dependencies. These new dependencies were omitted from requirements/*.txt and thus left unpinned for some time, including making in a released version.

This led to me being very confused why requirements/*.txt files were changing dramatically when I ran pip-compile-multi --no-upgrade on a released version of the superset project, when I hadn't even made any changes to requirements/*.in yet! Took me a number of days to figure out the source of the issue. πŸ˜•

The current verify command, while presumably better than nothing, seems to me to give a somewhat false sense of security, given this issue.

Would you be open to a new pre-commit hook being added (and/or the existing hook modified) to re-run pip-compile-multi --no-upgrade entirely if requirements/*.txt has changed? Or maybe pip-compile-multi verify should itself run pip-compile-multi --no-upgrade if it detects that...? πŸ€”

Thoughts?

vergenzt commented 1 year ago

Related to #336

peterdemin commented 1 year ago

Hi @vergenzt, probably verify is too broad of a name for the functionality it provides. Verify has to run fast - in milliseconds. But locking versions on a big project can easily go into minutes. I'm okay with adding another pre-commit hook that will run pip-compile-multi --no-upgrade, which could be useful for small projects. If you want to implement that, keep in mind that it should allow passing down other CLI options. For the problem you explained, I would suggest an organizational solution though. All changes to .txt files must be either performed by pip-compile-multi, or be followed by pip-compile-multi --no-upgrade run. In my projects, I just put these instructions at the top of .txt and .in files. You also might be interested in --upgrade-package option that allows passing a single package to be upgraded, and also can take a version constraint for this package. For example pip-compile-multi -P 'pandas~=2.3.4'

tysonclugg commented 1 year ago

Can annotations in *.txt files be parsed to determine results of the resolver, without having to run the resolver again?

If --verify sees Flask-Limiter==xxx ... # via flask-appbuilder and foo==yyy ... # via Flask-Limiter then it could deduce that both Flask-Limiter==xxx and foo==yyy are required by flask-appbuilder and should appear in all similar *.txt files.

peterdemin commented 1 year ago

That would be a feature request for the underlying pip-tools project. Pip-compile-multi doesn't resolve dependencies

tysonclugg commented 1 year ago

I don't think updating the behaviour of verify is for pip-tools, AFAICT that project doesn't have a verify feature.

I wasn't suggesting that pip-compile-multi should resolve dependencies either, just that it could be updated to understand the output of annotations. Then it could verify that all transitive dependencies are included as per the title and description of this issue, within milliseconds since it won't need to run a resolver.

peterdemin commented 1 year ago

Sorry, I misread your previous message. Let me clarify:

Sometimes, a simple version bump in a requirements .txt file means that a new transitive dependency has been added

Was this version bump performed by pip-compile-multi --upgrade-package apache-superset or manually without running pip-compile-multi? If yes, that's a bug in --upgrade-package handling, and it should be fixed there. If not, it's a problem with a process and should be resolved organizationally (maybe by rewording the header in the .txt file to say not to make manual changes to the file).

Another solution (that I've implemented in a closed-source codebase, and could open-source in the future) is to automatically generate a report using pipdeptree after installing the new requirements. The report ensures that all the installed dependencies are pinned and also has extra logic to surface comments for affected packages from .in files.

vergenzt commented 1 year ago

Was this version bump performed by pip-compile-multi --upgrade-package apache-superset or manually without running pip-compile-multi?

I think the version bump was performed manually.

If not, it's a problem with a process and should be resolved organizationally (maybe by rewording the header in the .txt file to say not to make manual changes to the file).

I agree, partially. I think a reword like you suggest would not be a bad thing. However I think there is also room for automated assistance with enforcing such an organizational norm.

Perhaps... a pre-commit hook? πŸ™‚

Would you be open to a PR that adds something like the following to .pre-commit-hooks.yaml:

- id: pip-compile-multi-recompile
  name: pip-compile-multi
  language: python
  entry: pip-compile-multi --no-upgrade
  files: ^requirements/.*\.txt$
  pass_filenames: false
  require_serial: true
  types: [file, non-executable, text]

This hook will trigger a recompile any time requirements/*.txt files change, and only for organizations that opt into the hook by adding it to their .pre-commit-config.yaml.

peterdemin commented 1 year ago

I'm a bit hesitant to add that, as I can see a potential confusion. What if you try this as a custom hook in your project for a couple weeks, and if this experiment succeeds, we can add it to the repo along with the necessary documentation?