pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.5k stars 3.01k forks source link

Space in filename in PIP_CONSTRAINTS is interpreted as a new file #10114

Open henryiii opened 3 years ago

henryiii commented 3 years ago

Description

The PIP_CONSTRAINTS variable (and likely any other path related variables?) can't handle a path with a space in it. Something like PIP_CONSTRAINTS="C:\Program Files\..." causes Could not open requirements file: [Errno 2] No such file or directory: 'C:\\Program'. These path variables should probably interpret spaces as part of paths, and use a separator for multiple values (like : or ;).

Also, quotes around the path causes this to fail, too, PIP_CONSTRAINTS="'C:\Program Files\...'" breaks.

See https://github.com/pypa/cibuildwheel/issues/740

Expected behavior

Spaces in paths should not break.

pip version

21.1.3

Python version

All

OS

Windows and macOS, at least, probably all.

How to Reproduce

  1. Make a constraints file with a space my constraints.txt
  2. Then run PIP_CONSTRAINTS="my constraints.txt" pip install <package>
  3. An error occurs.

Output

Could not open requirements file: [Errno 2] No such file or directory: 'C:\\Program'`

Code of Conduct

uranusjr commented 3 years ago

I don’t think there is a good solution to this, unfortunately. The space is used as the delimiter from basically pip’s inception, and it’s way too late to change that. We could introduce some quoting logic to fix this in theory, but it’s going to cause incompatibility elsewhere that’s going to confuse other people. I think the only sensible “solution” is to document this well so users know what’s going wrong and puts the file elsewhere.

henryiii commented 3 years ago

I would recommend treating "path" variables differently, and not allow spaces there, or at the very least, allow quoted values. Is anyone really using PIP_CONSTRAINTS with spaces to add multiple files? Given it is not mentioned anywhere (it's inferred by the existence of --constraints), and is broken on paths with spaces, I think treating paths with special delimiters is likely okay; or at least PIP_CONSTRAINTS="'a path.txt' 'other path.txt". Are there any other affected "list of path" variables?

FYI, fish has special handling for PATH variables, so there's precedent.

Sometimes you can't avoid putting a path with a space. If you have a username with a space in it, you might not have permission to put it anywhere else. Spaces are valid in path on all operating systems (all meaning at least the big three).

@pypa/build-team is likely the most heavily affected, as there's no other way to pass constraints through build.

henryiii commented 3 years ago

The "worst" possible solution but backward compatible would be to try the first path, if it doesn't exist, keep adding the next space-separated path on to it and try that, etc, only erroring out if all joined paths also do not exist. 🤦

uranusjr commented 3 years ago

I don’t use constraints much myself, but it’s very common to supply multiple paths to PIP_REQUIREMET, which has the exact same format. At the very least, The bottom line is, the ability to put spaces in the path is not noticeably more in demand than multiple values, so there’s no reason we should break backward compatibility for one to accommodate the other.

I think treating paths with special delimiters is likely okay; or at least PIP_CONSTRAINTS="'a path.txt' 'other path.txt".

This immediately breaks having the quote character in path, so you’ll need a way to support that, which is what I meant by complex quoting. Having quotes in a path is of course significantly less common, so maybe it’s reasonable to break that if we go through a proper deprecation phase. That’s no longer trivial work, but I’ll be happy to review changes if someone is willing to work it though. We can use shlex.split() to parse the values. This should be enabled initially behind something like --use-feature=complex-path-environ, and we should detect whether the user is putting quotes in a path environ and show a warning telling them the behaviour will change. We apply the usual deprecation policy after that.

Are there any other affected "list of path" variables?

From the top of my head, requirement, index-url, extra-index-url, and find-links. There are probably more.

henryiii commented 3 years ago

Well, a GitHub search for PIP_CONSTRAINT shows 56 usages in all of GitHub, only half or less seem to be the environment variable, and they are mostly setting it to the local dir + /constraint.txt or similar, which will not have a space (cibuildwheel has an internal constraints.txt, so when it gets installed to C:\Program Files\..., it has no choice in the matter). I did not see a single example with multiple paths.

I would expect almost no one sets -r with an envvar - and yes, a GitHub search for PIP_REQUIREMENTS shows 53 usages, this time almost all are variable names (case insensitive search, sadly).

I would expect index-url and extra-index-url to almost always be URLS, which don't have spaces, but those also could be considered "not PATHs" normally, and could be except from a PATH rule. You can always use URIs for those, given they have URL in the name (of course, #10115 should be fixed for those).

PIP_FIND_LINKS has 2K uses, so that's likely the one most likely to be affected - actually that's generally a URL too, though, so it could be exempt from the PATH rule.

IMHO, it seems PIP_CONSTRAINT and PIP_REQUIREMENT could be changed to be treated as a colon or semicolon separated list of paths with spaces, which would allow both lists and paths with spaces to work, whereas now, paths with spaces are not supported at all. Even with URIs on Windows. Unless there are other clearly "PATH" lists.

pfmoore commented 3 years ago

I agree with @uranusjr here. There are major backward compatibility questions here, which will need to be taken seriously in any change made in this area. And while github searches may give some indication, there's a lot of usage of pip that's not visible because it's in closed-source environments, and we can't break that either.

I'd be fine if someone wants to do the work on this, but don't expect it to be simple.

Also, please be aware that consistency in how we handle environment variables is important - we don't document environment variable processing for each variable individually, but rely on general rules (things like "options that take a list of values are read as space separated lists from environment variables"). Special-casing particular variables will need extra documentation, and will be less discoverable than you'd think because people are used to referring to the general rules.

Also, what do you propose to do about config files? I'm pretty sure they use the same rules so would you have the rules differ, or would you change how config files are interpreted too? Both have downsides.

layday commented 3 years ago

The obvious solution is to use os.pathsep as a delimiter. Since that is invalid as a filename character, its presence would toggle the new parsing mode on. Whether that’s actually something worth pursuing I do not know.

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Wednesday, June 30th, 2021 at 10:52, Paul Moore @.***> wrote:

I agree with @.***(https://github.com/uranusjr) here. There are major backward compatibility questions here, which will need to be taken seriously in any change made in this area. And while github searches may give some indication, there's a lot of usage of pip that's not visible because it's in closed-source environments, and we can't break that either.

I'd be fine if someone wants to do the work on this, but don't expect it to be simple.

Also, please be aware that consistency in how we handle environment variables is important - we don't document environment variable processing for each variable individually, but rely on general rules (things like "options that take a list of values are read as space separated lists from environment variables"). Special-casing particular variables will need extra documentation, and will be less discoverable than you'd think because people are used to referring to the general rules.

Also, what do you propose to do about config files? I'm pretty sure they use the same rules so would you have the rules differ, or would you change how config files are interpreted too? Both have downsides.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pfmoore commented 3 years ago
>>> Path("a" + os.pathsep + "b").write_text("No, it's not")
12

Sure, it's not normally used in lists that are separated by os.pathsep, but it is a valid path character nevertheless. So yes, it's the obvious solution, but it doesn't help with the backward compatibility issues...

layday commented 3 years ago

I stand corrected, I'm too used to macOS where special rules apply to ":" because of its history as a directory separator.

pfmoore commented 3 years ago

Ah, I didn't know MacOS did that. Cross-platform rules are weird 🙂

uranusjr commented 3 years ago

I believe since 10.0 the : path separator is purely cosmetic on macOS; it always stores paths with /, and only transforms it to : when showing it in GUI. So as far as Python code (which is run in POSIX mode) is concerned, we can use os.pathsep without issues.

But there are two major downsides here. First, this means the accepted values on POSIX and Windows will be different (os.pathsep is ; on Windows), which will cause user confusion. Another is that most of pip’s path arguments are actually path-or-URL (including PIP_CONSTRAINT!) and obviously you’re going you run into issues parsing /a/local/path:https://rawgithubcontent.com/pypa/pip/requirements.txt. So I think the quoting + space solution is the way to go if we want to do this.

ofek commented 9 months ago

FYI I encountered this today and after trying various things there is actually a workaround, albeit inconvenient. Transform the path to a URI:

image