peterbe / hashin

Helping you write hashed entries for packages in your requirements.txt
https://www.peterbe.com/plog/hashin
MIT License
105 stars 27 forks source link

Something like --update-all but without installing any packages #100

Closed terrisgit closed 5 years ago

terrisgit commented 5 years ago

I'm trying to create a requirements-lock.txt file containing exact versions of all dependencies with hashes so that I can later use "pip install -r requirements-lock.txt" to install the exact version of everything while verifying that the packages haven't been tampered with.

I also don't want hashin and its dependencies to end up in requirements-lock.txt

Here is how to get what I want via shell scripting. First you start with a pristine Python environment. Then...

pip install
pip freeze > requirements-lock.txt
pip install hashin  # Do this after pip freeze !
# Remove my package from requirements-lock.txt (the one installed via setup.py)
grep -v my_package_name requirements-lock.txt > requirements-lock2.txt
# Clear requirements-lock.txt
echo > requirements-lock.txt
# Send every line in requirements-lock2.txt to hashin
tr '\n' ' ' < requirements-lock2.txt | xargs python3 -m hashin -r requirements-lock.txt
rm requirements-lock2.txt
peterbe commented 5 years ago

First of all, I'm sceptical towards pip freeze in general. It might make sense that one rare day when you start a project. From that day onwards one should be more explicit and deliberate about which packages get included.

Individual developers tend to include a bunch of stuff in their environments that are NOT needed in CI and production servers. E.g. pytest-watch or ipython or even hashin itself. These should wonderfully useful for local development but should not go into the environment that you install on the server.

Another reason I'm skeptical is that one should ideally be more careful how you organize the packages. For example, I usually put all the packages I know I need (e.g. Django) into a requirements.txt and all the supporting packages into a constraints.txt file. So something like this:

$ hashin requests -r requirements.txt
$ hashin -r constraints.txt certifi urllib3 idna chardet pyOpenSSL cryptography

Now you can clearly see which packages are really needed for your project.

The other use case is that one should probably keep a separate file for packages that only needed for linting and/or testing. E.g. tests-requirements.txt (e.g. pytest) and lint-requirements.txt (e.g. flake8). Sure, these will probably need to be installed in CI but are packages you don't want to bother your production server with (or whatever the thing is that runs the core of the project).

peterbe commented 5 years ago

My point is if your project is important enough that you need hashes, you should probably be more deliberate and careful with your requirements files. If the project is not important enough to be "organized" with your requirements files, perhaps it's not important enough to use hashes. After all, it's a tool for people who have big projects with high-security standards who are worried about accidentally running with different versions compared to what they tested/developed with.

terrisgit commented 5 years ago

My project has a requirements.txt that allows patches to be installed. Freeze is used after upgrading packages and running tests, an infrequent process (once a quarter). We use only requirements-freeze.txt (containing hashes) to do installs. The project is an internal application rather than a reusable package.

Finally, getting what you expect when other packages depend on your dependencies is already a problem. This is our solution. Would love to see Python and JavaScript get this right out of the box (I’m watching pyenv) especially for newbies who have no idea what they are doing and are therefore vulnerable to motivated bad actors.

peterbe commented 5 years ago

I'm sorry I derailed the issue. Just thought I'd take a moment to share some thoughts on a topic I've thought a lot about.

Anyway, I think your bash script is fine. The only thing that is weird is that I think it would be cooler if you could do something like this:

$ pip freeze | grep -v my_package_name | grep -v hashin | xargs python -m hashin -r requirements-lock.txt

I.e. that independent of what or how you do it, hashin should be able to read in a list from stdin.

Is that something you'd be interested in working on?

terrisgit commented 5 years ago

hashin has dependencies that I don't want mixed in.

I actually have 5 requirements files for 'dev', 'test', .. It really eventually becomes a mess when you're not just hacking a throw-away prototype.

I was pretending in my original post that I used setup.py to install requirements (some developers do). I'm suspecting that pyenv will eventually replace that. I was trying to avoid being opinionated.

Your first suggestion is probably best.

  1. Install Pristine Python environment plus hashin
  2. hashin --update-all -r requirements.txt

There's another drawback with freeze -- that 'pristine' is not really pristine, if you upgrade pip and setuptools prior to running hashin, which you should. And by the time you use the 'freeze' file, those packages are out of date. I choose to ignore such thoughts.

Too bad most newbies will never figure this all out. Maybe someone could publish a modern python3 skeleton project someday, or -god forbid- it be part of a python distro.

Religious wars commence!

peterbe commented 5 years ago

For what it's worth, and sorry if you already knew this, but a lot of people have hashin installed with pipsi. That way they can have executable python scripts for their whole machine without "polluting" the virtualenvs.

terrisgit commented 5 years ago

No, I had no idea. Thank you. Again, NodeJS is also a cyberattacker's delight. Dependency management is a mess and a huge waste of money. End of opinion transmission.