pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.49k stars 3.01k forks source link

pip freeze with a hash #4732

Open lofidevops opened 7 years ago

lofidevops commented 7 years ago

Description:

User story: I am a Python developer with an existing requirements.txt file. I want to add hashes to the file, so that future installations are more secure.

What I've run:

At the moment I need to:

It would be great if instead I could:

Today's solution:

Pipfile is a replacement for requirements.txt that includes hashes in a file called Pipfile.lock.

pipenv is a tool for managing your virtualenv based on Pipfile, including checks against the hashes defined in Pipfile.lock. (It can also convert a requirements.txt file.)

Suggested solution:

Supporting Pipfile at the pip layer (rather than a higher-level tool) is on the PyPA roadmap, see https://github.com/pypa/pipfile#pip-integration-eventual :

pip will grow a new command line option, -p / --pipfile to install the versions as specified in a Pipfile, similar to its existing -r / --requirement argument for installing requirements.txt files. ... To manually update the Pipfile.lock:

$ pip freeze -p different_pipfile different_pipfile.lock (73d81f) written to disk.

The implication is that this is the preferred solution to supporting hashes (rather than adding them to requirements.txt or pip freeze). The current status "Deferred till PR" (see this ticket). See also https://github.com/pypa/pip/issues/6925

alecbz commented 7 years ago

Is there at least some way to easily script this? E.g., can I loop over a pip freeze and somehow programmatically find the file I need to pass to pip hash?

rbanffy commented 7 years ago

PIP would need to calculate and keep the hash somewhere as it installs the package. When doing a freeze, it'd retrieve the information.

max-wittig commented 6 years ago

This would be an awesome feature, indeed.

pradyunsg commented 6 years ago

This sounds like a good idea, although I am not sure how it'll work. As @max-wittig pointed out, the hash needs to be computed when the installation occurs, when the installation source is downloaded.

kiowa commented 6 years ago

You can get the hash from the cached wheel in ~/.cache/pip/wheels/

gifflen commented 6 years ago

It looks like pipenv is getting the hashes directly from the warehouse api

https://github.com/pypa/pipenv/blob/master/pipenv/utils.py#L468-L508

andrewchambers commented 6 years ago

The fact this doesn't exist is just terrible. I just wrote a workaround:

https://github.com/andrewchambers/mummipy

enjoy.

Julian commented 6 years ago

@andrewchambers perhaps instead of the slight barbs consider sending a PR?

lofidevops commented 6 years ago

It appears that this user story (Python developer wanting to hash their dependencies) is addressed by pipenv, a distinct PyPA project. See https://docs.pipenv.org/basics/#pipfile-lock-security-features for details. So I'm closing this issue assuming that this user story is out-of-scope for pip itself, and best handled by a "higher-level" tool.

Other readers also might be interested in:

Julian commented 6 years ago

Are you saying you think the entirety of pip freeze is out of scope for pip now?

Because if not, this seems like a very logical thing for pip. Not all of us use any current higher level tool, and it's pip itself that introduced the possibility of having hashes in requirements files.

Without this feature it's pretty unfeasible to generate those.

Saying "patches welcome" seems very reasonable, but closing not so much.

On Mon, Aug 6, 2018, 17:26 d❤vid notifications@github.com wrote:

Closed #4732 https://github.com/pypa/pip/issues/4732.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/4732#event-1773030507, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUIXkYXs7_gKTjNsPIAurvfoyr6k-Gkks5uOFIagaJpZM4Pduuk .

pfmoore commented 6 years ago

@Julian note that it was the OP who closed the issue, not the pip developers. The option for someone to create a PR for this remains available to anyone interested in the feature.

Julian commented 6 years ago

Ah, indeed, thanks!

Great, glad to hear it's not being designated as out of scope.

lofidevops commented 6 years ago

See the proposal for a pip freeze -p pipfile command at https://github.com/pypa/pipfile#pip-integration-eventual , which directly solves this user story for pip. I've reopened this ticket because it is clearly on the (long-term) roadmap for pip.

lofidevops commented 6 years ago

I've updated the ticket description with the proposed solution (as I understand it). Note that Pipfile-based dependencies are usable today if you use pipenv.

terrisgit commented 5 years ago

See today's convoluted workaround at your handy https://github.com/peterbe/hashin/issues/100

max-wittig commented 5 years ago

I just switch to Pipenv, which supports this workflow. Sadly it's still not included in the default python package.

bittner commented 5 years ago

Is there any roadmap or concrete discussion about implementing the proposed -p / --pipfile option that may replace the -r option in the long run? I'm having a hard time to find this.

chamoda commented 5 years ago

This will generate a requirements.txt with hashes

pip-compile requirements.txt --generate-hashes

Note that this will directly modify existing requirements.txt file.

You can install pip-compile with pip install pip-tools

BenjamenMeyer commented 4 years ago

would still be good to have this directly via pip freeze instead of having to use other tooling; and pipenv and Pipfile comes with their own set of headaches.

Jongy commented 4 years ago

pip freeze --hash will be very useful.

pradyunsg commented 4 years ago

This sounds like a good idea, although I am not sure how it'll work. As @max-wittig pointed out, the hash needs to be computed when the installation occurs, when the installation source is downloaded.

If someone wants to file a PR implementing this, they're welcome to do so! Note that we'd like to see this functionality in pip, but the PR would be subject to our regular code review processes (i.e. we're not gonna merge a PR just because someone filed it).

pradyunsg commented 4 years ago

I've labelled this issue as an "deferred till PR".

This label is essentially for indicating that further discussion related to this issue should be deferred until someone comes around to make a PR. This does not mean that the said PR would be accepted - that decision has been deferred until the PR is made.

NoahGorny commented 4 years ago

@pradyunsg @deveshks I am thinking about implementing this PR. I think that hashes should not be obtained from remote, and so must be computed at install time (maybe fallback from cache?) How about adding a new file to the wheel metadata that, similar to RECORD, that will contain the pre-installed hash of the package? Never did that but I saw that it is not a lot of diff and a simpler solution. This will allow pip freeze to quickly list hashes of installed packages, however there is no support of listing hashes of packages that were installed with older versions of pip. Another problem is that this behavior is different from the behavior of different tools that compute hashes for packages, which they usually get from a remote PYPI, which seems like an unsafe option overall.

I will start to work on this in the following days, let me know what you think about my idea :smile:

sbidoul commented 4 years ago

@NoahGorny cool you want to work on this.

You'll also want to pay attention to the wheel cache. When installing something that has been built (e.g. an sdist) and cached as a wheel, we probably want the hash of the original sdist or direct url target, and not the hash of the wheel that we have in cache.

chrahunt commented 4 years ago

What about generating the lock file at install-time, like npm, yarn, pipenv, poetry, Cargo, and Conan do? (sorry if I missed any)

  1. pip install -r requirements.txt --lock requirements.txt.lock
  2. Commit requirements.txt.lock
  3. Afterwards, invoke pip install -r requirements.txt.lock

On updates to requirements.txt, do the same steps.

This directly supports with the stated use case:

I am a Python developer with an existing requirements.txt file. I want to add hashes to the file, so that future installations are more secure.

but it avoids a lot of the extra work that is being described in #8519.

BenjamenMeyer commented 4 years ago

@NoahGorny WRT hashes:

Don't trust remote, but don't necessarily trust local either. Verify they match to give the user assurance that the right thing is installed. If they don't match, generate an error; provide a way to force the install if they don't match, but by default uninstall/rollback if they don't match (or depending where/when you're generating the hashes...don't install to start with, which would be even better).

NoahGorny commented 4 years ago

@NoahGorny WRT hashes:

* get the hash from the remote

* generate the hash locally

* verify the hashes match

Don't trust remote, but don't necessarily trust local either. Verify they match to give the user assurance that the right thing is installed. If they don't match, generate an error; provide a way to force the install if they don't match, but by default uninstall/rollback if they don't match (or depending where/when you're generating the hashes...don't install to start with, which would be even better).

We can not always generate the hash locally after installation, that's why we create the new HASH file. However, I am not sure we should fetch hashes from remote each time we freeze the environment...

What about generating the lock file at install-time, like npm, yarn, pipenv, poetry, Cargo, and Conan do? (sorry if I missed any)

1. `pip install -r requirements.txt --lock requirements.txt.lock`

2. Commit `requirements.txt.lock`

3. Afterwards, invoke `pip install -r requirements.txt.lock`

On updates to requirements.txt, do the same steps.

This directly supports with the stated use case:

I am a Python developer with an existing requirements.txt file. I want to add hashes to the file, so that future installations are more secure.

but it avoids a lot of the extra work that is being described in #8519.

This requires users to actively generate lockfiles in installations, and only works if the user is installing from requirements file in the first place. This is a good option for such users, but in other use cases I think it does not work just as well

sbidoul commented 4 years ago

The approach suggested by @chrahunt in https://github.com/pypa/pip/issues/4732#issuecomment-657898593 is also valuable in a lot of situations. It has complexities to think through too, for instance when the install command is used to update an existing environment, and when pip decides it does not need to reinstall some already installed dependencies. In such cases we'd still need a way to obtain information about the hashes of installed distributions.

chrahunt commented 4 years ago

This requires users to actively generate lockfiles in installations, and only works if the user is installing from requirements file in the first place. This is a good option for such users, but in other use cases I think it does not work just as well

This use case from the original issue assumes we have a requirements file, and several comments refer to Pipfile support, which would work in the same way. I think there may be some people who would want to get their environment set up and then generate a lock file for it, but IMO we risk not actually satisfying this issue adequately if we try to solve that one at the same time.

It has complexities to think through too, for instance when the install command is used to update an existing environment

Good point. It would be worthwhile to see how other dependency managers behave in that situation. If it turns out it's common (and generally agreed to be necessary) to store hashes with the installed packages, then that could be turned right around and included in the PEP itself. :)

BenjamenMeyer commented 4 years ago

@chrahunt to give confidence that the right thing is being installed; I would think you'd want something generated before it's installed that could easily be verified.

Question: what all is getting hashed? (or being proposed to being hashed)

pradyunsg commented 4 years ago

Question: what all is getting hashed? (or being proposed to being hashed)

See https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode -- it's the entire files.

pip can check downloaded package archives against local hashes to protect against remote tampering

BenjamenMeyer commented 4 years ago

Question: what all is getting hashed? (or being proposed to being hashed)

See https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode -- it's the entire files.

That doesn't really say what gets hashed, just requirement around hashing.

If it's the generated file, there wouldn't be an issue with hashing wheels. A hash would also be verifiable against what is downloaded vs what is installed.

Something is off.

uranusjr commented 4 years ago

When you pip install, pip downloads an archive from PyPI (or another index you specify) to extract. If you specify a hash in the requirements file, that archive’s hash is checked against the hash list you provide. But you can’t re-generate the exact same wheel from the installation, since not every file in the wheel is installed (and even more is lost if you install from source). I am honestly not getting what you think is off.

BenjamenMeyer commented 4 years ago

@uranusjr if you're checking the hash of the archive prior to extracting it, then it doesn't matter what happens after. If you're hashing what is actually put into the system, then of course it's going to change all the time but that's also an extremely bad design since you cannot have deterministic hashing behavior.

Honestly, Python/Pip should follow the package hashing done by RPM, Deb, and others. You hash the package itself, not it's installed data. This provides deterministic behavior and can be verified before an install is ever done.

IOW - there should be no need to regenerate a wheel from the installation; you're not hashing the installation but the package itself.

uranusjr commented 4 years ago

What you describe is exactly what pip is currently doing. The problem in this thread is the other way around: people are looking for a way to generate hashes from installed data, and the pip developers are trying to explain we don’t know how this can be done.

NoahGorny commented 4 years ago

I had an attempt at #8519 which got stale... It did not have much traction and I drifted away from that, but I am ready to work on this once more. In my PR, I tried to add a new file into the installation folder, which specifies the hashes of the source package (not the installed data!). Using this file we can easily output the hashes when needed. I think this is the easiest solution to this problem, although it has some setbacks (new file requires a new PEP, need to be dynamic, etc...) take a look and lemme know what you think @BenjamenMeyer

BenjamenMeyer commented 4 years ago

@uranusjr if that's the case then it there is certainly an answer - an emphatic no it cannot be done from installed data, nor should that be desired. Just hash the package files that get uploaded to Pypi and be done.

@NoahGorny that doesn't really clarify anything. I did leave a comment about one aspect.

sbidoul commented 4 years ago

I personally think pip freeze with hash is desirable, and would facilitate common workflows.

It is feasible if we record the hash of the distribution that was downloaded for installation (not the wheel we possibly built locally). It is not trivial because we have the (wheel) cache in between. And adding information in .dist-info requires a PEP (although I think we would benefit from a PEP that allows tools to record their own stuff in .dist-info, so we could iterate more rapidly and worry about interop later).

altendky commented 4 years ago

Is it that painful to put the locking up front and then use the lock to control the environment? Rather than controlling the environment then reaching back up the data path to get the hashes later?

BenjamenMeyer commented 4 years ago

@sbidoul if you record the hash of the file that was downloaded (the package) whether wheel or otherwise it's easy to verify. If it's a VCS download, then a driver for the VCS should take the VCS location (git URL, svn URL, etc) and some repo data (git hash, svn revision, etc) to create a hash which could then be standardized and easily used.

Trying to generate from the installed data is very problematic from numerous aspects:

A single package (wheel, bdist, sdist) should have exactly 1 hash that would match it.

sbidoul commented 4 years ago

Is it that painful to put the locking up front and then use the lock to control the environment

@altendky I'd say it is cumbersome today. And it seems the tools that automate it have to hack pip internals or reimplement a sizeable portion of it to achieve that goal.

So my feeling is that pip would help broader adoption of hash checking if it exposed mechanisms to facilitate hashes discovery.

pip freeze with hashes is one of such mechanism. Another is to let pip report more information about what it does when installing (or dry-run install), such as the (hashes of) distributions it downloaded to perform the install.

sbidoul commented 4 years ago

Trying to generate from the installed data is very problematic from numerous aspects:

@BenjamenMeyer I don't think anyone is attempting to do that indeed.

Regarding VCS, I'd say we don't really need anything special. I would simply relax a little bit pip's hash checking mode to consider that commit references for VCS that have immutable commit refs (git shas, ...) are sufficient as a hash mechanism.

altendky commented 4 years ago

@sbidoul, I didn't say pip shouldn't support it, just that perhaps the order of operations should be slightly different than requested here.

yawaramin commented 3 years ago

How about an alternative UX:

$ pip install --require--hashes
...
ERROR: Hashes are required in --require-hashes mode, but they are missing from some requirements. Here is a list of those requirements along with the hashes their downloaded archives actually had. Add lines like these to your requirements files to prevent tampering. (If you did not enable --require-hashes manually, note that it turns on automatically when any package has a hash.)
    foo==1.0.0 --hash=sha256:...
    bar==2.0.0 --hash=sha256:...
   ...

As it turns out, right now it does almost this, except it prints only one requirement hash on each run of pip install, forcing the user to run it repeatedly. Instead, could it print all the requirement hashes? If so, it sidesteps the issue of where to store the hashes between install and freeze and makes the user follow a different workflow altogether.

uranusjr commented 3 years ago

Instead, could it print all the requirement hashes?

Theoratically yes (well it can print out all the hashes it knows; theoratically there are infinite possible hashes), but pip does not currently have the mechanism to do so. a PR exploring this would be much welcomed.

DoWhileGeek commented 2 years ago

dang, this issue is almost five years old and still open/ambiguous.

The-Compiler commented 2 years ago

Would this perhaps be something which would be a good fit with pip install --dry-run --report --ignore-installed from #10771? If that had a --report-format=requirements or somesuch, that could be used to generate a requirements.txt with hashes and such, no?

(I suppose it would also be possible to write a small tool which takes the JSON output and converts it to a requirements.txt)

pfmoore commented 2 years ago

(I suppose it would also be possible to write a small tool which takes the JSON output and converts it to a requirements.txt)

The idea of using JSON format is precisely so that such tools are easy to write without needing changes to pip, so yes, that would be the recommended approach.

AkechiShiro commented 10 months ago

Any way forward on this issue, any way someone could help ? What is left to be done/discussed ?