pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.5k stars 3.01k forks source link

editable cannot be installed when requiring hashes #4995

Open jedie opened 6 years ago

jedie commented 6 years ago

Description:

I added --hash= information to my requirements.txt, but it seems that hash usage can't be used in combination with editables?!?

Running pip3 install --exists-action s -r requirements.txt creates this message:

The editable requirement from git+https://github.com/... (from -r requirements.txt (line )) cannot be installed when requiring hashes, because there is no single file to hash.

Is this a "known limitation" or a bug?

ghost commented 6 years ago

This issue appear to be invalid, since there's no reasonable improvement to the error message. Out of curiosity, why did you not understand the error message and what do you think it should be?

pradyunsg commented 6 years ago

Hi @jedie!

pip does hash checking on a single file, the wheel archive or source distribution archive, which are basically a complete representation of what you're about to install. It is easy to just compute the hash of a single file and verify that the hash checks out when getting the same file over the network.

The one case where it doesn't use a wheel archive or a source distribution is editable installations. These installations are really just doing setup.py develop. In this case, there's no single file like a source archive, that is representative of the entire package -- there's nothing to compute the hash of for doing useful hash-checking.

This is really a limitation due to the nature of editable installs. They just are mutually exclusive to hash-checking by virtue of how they work.


I hope that clarifies your doubt. I'm curious as to what you think the error message was missing to be able to communicate the same.

jedie commented 6 years ago

I'm aware of the fact that hash compare doesn't work on editables ;) (Of course, pip could create a hash over all checked out files. But that is not the topic here.)

I have a requirements.txt with some editables and with normal PyPi packages. The hashes are only attached to the normal package names.

I would assume that pip check the hashes for the normal PyPi downloads. But pip simply aborts with the message from above and doesn't do anything more.

So: I can't use the hash feature for normal PyPi packages if i have one or more editables in requirements.txt ?!?

pradyunsg commented 6 years ago

Ah, I think what you'll want to do is break up the requirements.txt into 2 and invoke pip 2 times - once without the editable requirements in hash-checking mode and then once with editable requirements.

I'm actually not sure which ones you'll want to install first. I imagine the editables with --no-deps followed by the hash checked dependencies via requirements.txt would be a good process.

On Fri, 26 Jan 2018, 21:11 Jens Diemer, notifications@github.com wrote:

I'm aware of the fact that hash compare doesn't work on editables ;) (Of course, pip could create a hash over all checked out files. But that is not the topic here.)

I have a requirements.txt with some editables and with normal PyPi packages. The hashes are only attached to the normal package names.

I would assume that pip check the hashes for the normal PyPi downloads. But pip simply aborts with the message from above and doesn't do anything more.

So: I can't use the hash feature for normal PyPi packages if i have one or more editables in requirements.txt ?!?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/4995#issuecomment-360819310, or mute the thread https://github.com/notifications/unsubscribe-auth/ADH7SRAQOKUuGjajlz4dVACil63LnHwJks5tOfI9gaJpZM4RshLV .

earthboundkid commented 5 years ago

This is completely baffling and bizarre. Why should one requirement having a hash affect any other line in the requirements file? What if you only want hashes on some requirements?

rooterkyberian commented 5 years ago

Could we add something like --dont-require-hashes? or disable automatic enabling hash checking for every direct dependency and force use of --dont-require-hashes if someones desires it?

brandsimon commented 5 years ago

The example here is git, isnt it safe to use the hash of the last commit?

pradyunsg commented 5 years ago

If someone has a suggestion on how to fix this, you're welcome to make the suggestion and file a PR.

It'll go through the regular review process as every other PR in pip. If you want to help out and have an idea for how to fix this, please feel free to ping me. 🙃

pradyunsg commented 5 years ago

The example here is git, isnt it safe to use the hash of the last commit?

Editable installs can be from directories that are not under git or even any kind of version control.

I don't think special casing VCS setups is the way to go here.

Qu4tro commented 4 years ago

@pradyunsg if possible I would like your opinion on this subject. (Disclaimer: I'm moving things to Poetry and tbh this is my vision as an end-user for Poetry)

From my reading there are two main schools of thoughts here:

I might be missing others, possibly important, concerns.

I'm partial to having a separate requirements.txt, due to the splitting between "safe" and "unsafe". I think an application should strive to have "safe" packages and only install "unsafe" explicitly.

sbidoul commented 4 years ago

I personally think editable installs must continue to be rejected when --require-hashes is used, because the first thing pip does with these is update the src directory which may have local changes.

I'm in favor of https://github.com/pypa/pip/issues/6469, however.

@Qu4tro assuming #6469 gets traction, a third approach will be to try to understand why pip wrappers such as poetry or pip-tools use editable installs. Since I'm working towards implementing #609, I'm curious to understand if pip-tools and poetry would benefit from it and/or PEP 610. My general goal with this work is to reduce the need of using --editable for working with VCS requirements.

kapilt commented 4 years ago

there are many reasons for editable, most commonly is when referencing the package under development, most commonly for local development (avoiding extra copy of src into site-packages) to facilitate editing source, and ci/testing setup (ie I run into this issue with tox), but also arising in more complex cases of a mono repo with several packages and interdependencies.

of the two variants, @Qu4tro suggested, I'd like to see the allow pip to install editable packages alongside hashed packages. unfortunately the other variant isn't really compatible with other tools in the ecosystem (tox), which want to generate a single invocation to pip for install with all configured requirement files/dep strings.

@sbidoul afaics, your conflating editable with vcs when its just a directory vs a vcs url, ie vcs urls are editable, but editable does not imply vcs url it may just be a path/dir

uranusjr commented 4 years ago

Personally I would prefer to keep the current default; editable installs are by definition cannot be pinned, so it does not make sense to ask for it when hashes are required. Making it a special case feels to me like creating a loophole. The two-step solution not working for tools sounds like a toolchain problem to me; tools should provide a way to run multiple commands in the installation phase, instead of pip (or any other installer command) needing to cram steps into one command.

I do feel though it would be acceptable to have some way for the user to explicitly say “hey I know I said I want hashes, but this one is special”. The flag can be added to the command (i.e. the --no-require-hashes proposal) or applied on a per-package bases (i.e. add something like --no-hash or --hash=none to a line in requirements.txt). It would be a conscious decision if the user decides to supply that, and if that’s the case, fine (we’re grown-ups here).

earthboundkid commented 4 years ago

This is crazy. There is value to having some hashes, even if you don’t have hashes for everything. If someone passes an explicit “require hashes” flag, then fine, require them everywhere. It might even be good to always require hashes and use a flag to opt out on a specific basis. But none of that is the behavior of pip today which is being defended. The behavior of pip is that if you add one hash to a requirements.txt file, it requires hashes for every other line. That behavior makes no sense and is not a defensible default.

To an outsider, it one hundred precent feels like the default of the pip project is “reject all proposals because we’re underwater in tech debt” and the rationalizations come later and have no connection to the value of the underlying proposals.

earthboundkid commented 4 years ago

https://github.com/pypa/pip/issues/4995#issuecomment-575691807

Look at the pros and cons list here. I think it’s a good summary. The cons of changing the behavior are “requires work from pip team”. That’s it. If you’re underwater in tech debt and can’t solve issues, fine, that’s open source. Life is tough and not all bugs are high priority. But don’t make up rationalizations that make no sense to defend behavior that was never designed and just emerged organically and unintentionally.

pfmoore commented 4 years ago

This is crazy.

the default of the pip project is “reject all proposals because we’re underwater in tech debt”

don’t make up rationalizations that make no sense

@carlmjohnson You might want to tone down your criticisms a little.

Look at the pros and cons list here.

OK, so looking at those pros and cons:

  1. Has anyone submitted a PR yet? There's very little chance of the pip developers making one (yes, we have very limited resources, and pretty much all of it is on a volunteer basis). Compared to a lot of other issues, and in spite of there being a lot of debate here, this is not a high priority issue affecting a lot of our users.
  2. Even with a PR, the change will still need pip developer input in the form of reviews, possibly coding assistance, and ongoing maintenance. So it will still be a slow process. Nobody's "blocking" anything, that's just open source with limited volunteers for you.
  3. Conversely, splitting requirements into 2 calls is something that an end user can do right now. It may not fit their workflow as well as a single call, but that's something they can decide for themselves. They aren't waiting on anyone else for a solution. To me, that's a big benefit.
  4. If a wrapper project like poetry needs to change to make 2 pip calls instead of one, then they have that option. They can also declare hash-checking and editables as unsupported until pip adds support. Or they can offer a PR to help pip implement support. Just like any other user of pip. And poetry users should probably be having the discussion with poetry in the first instance, not with pip directly.

But don’t make up rationalizations that make no sense to defend behavior that was never designed and just emerged organically and unintentionally.

I haven't been following this discussion very closely, so I don't intend to try to address this in detail, but do you really think that the pip developers are spending time undermining proposed solutions for no better reason than to block them? We can block solutions at far less effort by simply ignoring the thread altogether...

earthboundkid commented 4 years ago

I'm sorry if my tone was off, but I really feel like we're being gaslit. Look at the comments and emoji reactions from people not on the PyPA team. They all agree: this would be a nice feature and it's annoying not have by default. If it can't happen for priority reasons and you need someone else to find the time do a patch, again that's fine, that's life. But why make up these rationalizations as though we're wrong for wanting an obvious feature? Why respond to my comment at all? Delete it as off topic if you thought I was too harsh or just leave it with no reply. As I read it, it just comes out to a long way of saying "submit the patch, otherwise forget it."

pfmoore commented 4 years ago

I've just reviewed all of the comments from pip developers (and experienced contributors to pip) in this thread. None of them are "rationalisations" or any sort of objection. They explain the current behaviour and discuss options for addressing the requirement, nothing more.

I've added the label "deferred till PR", which says that further discussion should wait till there's a PR. I personally think that the discussion was perfectly fine, and would have been happy to continue here, but if you insist on characterising the contributions of the pip developers and experienced contributors as obstructive, then I suggest it would be better to wait for a PR. Hopefully tempers will have cooled by then (and it's not like anything will happen until there's a PR anyway).

Why respond to my comment at all?

Misguidedly, I thought you might appreciate an explanation. My bad.

pradyunsg commented 4 years ago

Just noting that I did say PRs are welcome quite a while back, and no one has responded/acted on the same: https://github.com/pypa/pip/issues/4995#issuecomment-517562606

And, @carlmjohnson if you haven't already, please consider reading/watching https://snarky.ca/setting-expectations-for-open-source-participation/.

pradyunsg commented 4 years ago

I'm going to go ahead and hide all the comments w.r.t. the "this is crazy" discussion, since IMO they're not very useful for someone who's trying to actually solve the problem, and have negative effects on any contributor that's considering contributing here. If you disagree with this move, please file a new issue for further discussion.

earthboundkid commented 4 years ago

I see the purpose of this issue as a place to discuss whether changes are warranted, and if so, which ones to make. So, let me respond in slightly more depth to https://github.com/pypa/pip/issues/4995#issuecomment-593773239:

Personally I would prefer to keep the current default; editable installs are by definition cannot be pinned, so it does not make sense to ask for it when hashes are required.

Hashes being required for all dependencies is not an explicit choice that users are making. Users are making the choice to include some hashes. Having some hashes has value even if not everything has a hash. For example, I may have a private repo that I install as editable but still want to have pinned hashes for external dependencies. There are many other such scenarios.

Making it a special case feels to me like creating a loophole.

It's a loophole to a problem created by an unexpected pip behavior. The root problem is one hash making all hashes required. Either change pip to always require hashes and opt out with a flag (seems unlikely because it would be breaking change for the ecosystem), or do the simpler change of adding a flag to require hashes. Editable repos don't come into it one way or the other. If I had passed a flag like "--require-hashes" I would not have been surprised, done a search, and found this issue. I found the issue because the behavior is apparently an unintended side-effect of requiring one hash.

The two-step solution not working for tools sounds like a toolchain problem to me; tools should provide a way to run multiple commands in the installation phase, instead of pip (or any other installer command) needing to cram steps into one command.

This strikes me as side-stepping of pip's core functionality. Pip is a tool to install Python dependencies. It need not do everything related to dependencies, but needing to split into multiple files or use a separate tool so as not to trigger an unexpected, undocumented, and undesirable behavior is just passing off responsibility for pip's core value to other tools.

petergaultney commented 4 years ago

I'm not trying to weigh in on anything else that's being discussed, but I agree with @carlmjohnson that having --hash on a single line in a requirements.txt fundamentally 'coupled' to every other line in that same file both violates the principle of least surprise and in general feels like incorrect software design. In fact, it's basically impossible to imagine an implementation of this that isn't more complex than the straightforward implementation where each line of a requirements.txt handles its flags individually.

FWIW, I have written a workaround wrapper for this that works well for our internal use case but likely could be improved if other people wanted to start using it and trying it out. But it takes 'advantage' of the odd situation by basically attempting a --no-deps install of every requirement individually, assuming that dependency resolution has already been accomplished separately, so it probably isn't a truly general purpose solution.

pfmoore commented 4 years ago

having --hash on a single line in a requirements.txt fundamentally 'coupled' to every other line in that same file both violates the principle of least surprise and in general feels like incorrect software design.

I believe that it was a deliberate decision, based on the wishes of the people who asked for hash checking mode in the first place - the idea being to avoid a security loophole where you are hash-checking everything, but accidentally add a dependency without a hash, leaving your application insecure when you think it's secure. I do agree that explicitly specifying --require-hashes seems like a reasonable thing in that case, though.

My memory is very vague on the details here, and I could easily be completely wrong. I encourage anyone who wants to argue for dropping the implicit enabling of --require-hashes whenever a hash is specified, to do some research to confirm (or otherwise) the original intent here. (And please do add a link here, to help others - including me - confirm the details).

mikenerone commented 4 years ago

Just my 2 cents as to the behavior that I would personally naturally expect:

mitar commented 4 years ago

I think this issue could be resolved in backwards compatible way by:

mikenerone commented 4 years ago

I do not see why it is so hard to recursively compute hash of a directory in a deterministic way...it is on you to make sure local directory has not been changed.

Aren't we talking about an editable install - i.e. the whole purpose is that you're editing it? It is, by definition, changing constantly, as @pradyunsg alluded to in their first comment.

earthboundkid commented 4 years ago

There are many reasons to use an editable install. One is to edit them. Another is to use a package which isn't on PyPI, such as a fork or private repo. For the former case, a hash is impractical, but for the latter it can be useful.

I often use editable packages just to make it easier to read the source for an important dependency in my text editor by putting it in a top level folder instead of buried in a virtual env somewhere (although I admit this is a niche usecase, so I wouldn't put much weight on it per se).

uranusjr commented 4 years ago

The raison d'etre of editable installs is to make the installed code editable. If all you want is to install something from private locations, you can just install from path without the editable flag. This issue is specifically about editable installs, and your latter reason thus does not apply.

apljungquist commented 3 years ago

I have not seen this idea mentioned anywhere so I thought I'd put it on the radar; Allow hashes to be explicitly ignored using a wildcard e.g. like --hash=*.

For context this would facilitate developing multiple packages out of one repo with internal and external dependencies. If for instance the dependency graph looks like below then it would be convenient to be able to do something like pip install -c constraints.txt intarnal1. At present this works only without hashes.

external<--+
internal0<-+
           +-internal1
sbidoul commented 1 year ago

I have been thinking about this again, as I came across the pip install -e . -c constraints.txt use case, where constraints.txt has requirements with hashes. This cannot be resolved by splitting into two pip install calls while keeping the desired semantics.

So I'm wondering we could accept requirements provided as local directories in --require-hashes mode. I am under the impression (from skimming this thread again) that, combined with #11968, this could resolve most use cases.

Of course that means that --require-hashes will not detect when a user adds a local directory to requirements anymore, but I'm not sure that was ever a goal of --require-hashes.

sinoroc commented 6 months ago

the pip install -e . -c constraints.txt use case, where constraints.txt has requirements with hashes

I am confronted with the exact same use case currently. This looks like it could be a good substitute for poetry install (constraints.txt is generated beforehand with poetry export --format=constraints.txt), but alas, it fails. I could not find a way to do this with two pip commands either.

P.S.: Soon after writing this, I realized that 1. it does not bring the discussion any further so I could have refrained from writing anything at all; and 2. I had excluded using requirements (instead of constraints), but I could not remember why, but now that I have tried properly with requirements again, it seems to be good enough for my use case poetry export > req.txt pip install --no-deps -r req.txt pip install --no-deps -e ..