pypa / pipenv

Python Development Workflow for Humans.
https://pipenv.pypa.io
MIT License
24.84k stars 1.87k forks source link

Capture more auditing metadata in the lock file #1886

Closed ncoghlan closed 1 year ago

ncoghlan commented 6 years ago

(From https://github.com/pypa/pipenv/pull/1865#issuecomment-377440298)

The --keep-outdated and --pre options to pipenv lock mean that re-running pipenv lock without those options may result in a different lock file.

Lock file generation is also inherently time dependent: if you want to recreate a previously locked lock file from a given Pipfile, you need to exclude any releases that weren't available at the time that lock file was generated.

I believe adding the following three fields to the lock files _meta section would provide enough info to allow a historical lock file to be reconstructed:

Although, given --keep-outdated, perhaps the locked_at metadata should be on the individual entries? Then it could be carried forward to each new version, and the lock file would inherently track how long it had been since each dependency had been updated.

Alternatively, instead of a kept_versions field, there could be an outdated_versions field that recorded which packages had newer versions available at the time the lock file was generated that met the Pipfile constraints, but had been ignored due to --keep-outdated.

techalchemy commented 6 years ago

@ncoghlan my rough understanding of the underlying implementation is that —keep-outdated should make no effort to update anything that doesn’t conflict with the specified package. That saves us having to re-download and run setup.py against various packages which can really speed things up. That might be wrong or it might require keeping actual specifiers in the lockfile along with pins.

For _meta we may want to differentiate a few things: generated_at, updated_at, and perhaps a list of keys representing top level packages so we can operate without a Pipfile?

Possibly we would want to store for each package then: specifiers, locked_at, updated_at, requires?

Requirement metadata is out of scope I realize, just tossing it in as a piece of the data structure puzzle as we think about the hierarchy

ncoghlan commented 6 years ago

Checking "What's the latest available version?" should only need a single query to the index server's simple API per package, so building a list of outdated packages should be much cheaper than actually resolving those versions. I agree it would be more expensive than skipping build that list, though.

The "operate-without-a-Pipfile" question is an interesting one, as it raises the question of whether or not a hash is the best way to be detecting the "Pipfile changed since Pipfile.lock was last generated" situation: since TOML supports comments, there are plenty of cosmetic changes that can be made without really altering the file from a dependency resolution perspective - the real questions are "Did the top level dependency declarations change?" and "Did the project settings that affect dependency resolution change?".

ncoghlan commented 6 years ago

I've been thinking about this general question of "assessing lockfile freshness" a bit, and realised that another useful date to track would be when pinned dependencies were last checked for security vulnerabilities.

I think if we had a well-defined concept of what it meant for a lock file to be "fresh" or "stale", then we could potentially warn about stale lock files when relying on them, and recommend running pipenv check or pipenv lock to refresh them.

And given that kind of warning on pipenv sync and pipenv install, then it could become reasonable to switch pipenv install itself over to a minimalist --selective-upgrade as the default behaviour, rather than opportunistically updating everything to the latest available version by default.

techalchemy commented 6 years ago

How often would we recommend that I wonder?

ncoghlan commented 6 years ago

I'm not sure, as the most suitable default varies based on the kind of code you're writing.

However, my initial inclination would be towards emitting a warning after something like 90 days (~3 months) or 180 days (~6 months), as "You should check your dependencies for security vulnerabilities at least 2-4 times a year" seems like a pretty reasonable maintenance recommendation to me. While it's honestly a bit low for full-time professional software development, it's much higher than is typical for hobbyist projects, and should also be manageable for work projects that are nevertheless a sideline to someone's main job.

pipenv install and pipenv sync could then accept a --fresh-until N option to say that entries are considered fresh until N days after they were either last updated, or last checked for security vulnerabilities.