python / cpython

The Python programming language
https://www.python.org
Other
62.5k stars 30k forks source link

Embeddable Python indicates that it uses PYTHONPATH #86418

Open 55c63543-9a1a-4828-9775-0cd4f61679d7 opened 3 years ago

55c63543-9a1a-4828-9775-0cd4f61679d7 commented 3 years ago
BPO 42252
Nosy @pfmoore, @tjguk, @zware, @eryksun, @zooba, @teeks99

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.10', '3.8', '3.9', 'OS-windows'] title = 'Embeddable Python indicates that it uses PYTHONPATH' updated_at = user = 'https://github.com/teeks99' ``` bugs.python.org fields: ```python activity = actor = 'steve.dower' assignee = 'none' closed = False closed_date = None closer = None components = ['Windows'] creation = creator = 'teeks99' dependencies = [] files = [] hgrepos = [] issue_num = 42252 keywords = [] message_count = 6.0 messages = ['380274', '380281', '380285', '380306', '380307', '380345'] nosy_count = 6.0 nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'teeks99'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue42252' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10'] ```

55c63543-9a1a-4828-9775-0cd4f61679d7 commented 3 years ago

According to the documentation https://docs.python.org/3/using/windows.html#windows-embeddable

When extracted, the embedded distribution is (almost) fully isolated from the user’s system, including environment variables, system registry settings, and installed packages

The embedded distribution should ignore the environment variables.

This is echoed in this prior issue that thought PYTHONPATH not being respected was a bug: https://bugs.python.org/issue28245

Regardless of the decision to respect environment variables, the message that is displayed when running the distribution's python --help needs to indicate how it will act.

Currently, for the embedded distribution, which doesn't respect the env variables, there is a section in the output from running python -help that indicates:

Other environment variables:
PYTHONSTARTUP: file executed on interactive startup (no default)
PYTHONPATH   : ';'-separated list of directories prefixed to the
               default module search path.  The result is sys.path.
PYTHONHOME   : alternate <prefix> directory (or <prefix>;<exec_prefix>).
               The default module search path uses <prefix>\python{major}{minor}.
PYTHONPLATLIBDIR : override sys.platlibdir.
PYTHONCASEOK : ignore case in 'import' statements (Windows).
PYTHONUTF8: if set to 1, enable the UTF-8 mode.
PYTHONIOENCODING: Encoding[:errors] used for stdin/stdout/stderr.
PYTHONFAULTHANDLER: dump the Python traceback on fatal errors.
PYTHONHASHSEED: if this variable is set to 'random', a random value is used
   to seed the hashes of str and bytes objects.  It can also be set to an
   integer in the range [0,4294967295] to get hash values with a
   predictable seed.
PYTHONMALLOC: set the Python memory allocators and/or install debug hooks
   on Python memory allocators. Use PYTHONMALLOC=debug to install debug
   hooks.
PYTHONCOERCECLOCALE: if this variable is set to 0, it disables the locale
   coercion behavior. Use PYTHONCOERCECLOCALE=warn to request display of
   locale coercion and locale compatibility warnings on stderr.
PYTHONBREAKPOINT: if this variable is set to 0, it disables the default
   debugger. It can be set to the callable of your debugger of choice.
PYTHONDEVMODE: enable the development mode.
PYTHONPYCACHEPREFIX: root directory for bytecode cache (pyc) files.

This may lead users (it did lead this one) to assume that they are doing something wrong when for example the output of sys.path doesn't included items in os.environ["PYTHONPATH"].

Realizing that it may be difficult to achieve, the help output should match the state of what the interpreter will actually do if run.

eryksun commented 3 years ago

The embeddable distribution isn't intended for end users to run Python scripts from the command line, so I don't think the CLI help needs to be special cased. The documentation you quoted should be clarified as something like "isolated from user and system Python settings, including environment variables such as PYTHONPATH, registry settings, and installed packages". It would be helpful as well if the Windows download links on python.org explained or linked to the intended use case for the embeddable distribution. Sometimes people mistakenly download it when they really need a normal Python installation.

55c63543-9a1a-4828-9775-0cd4f61679d7 commented 3 years ago

I'm not sure I agree with that. One possible use-case is to package it along with another program to use the interpreter. In this case they could use the other program's native language features (e.g. .Net's Process.Start(), Win32 API's CreateProcess(), Even Python's subprocess but why?, etc) to run python.exe myscript.py.

In this case, the user may assume that adding something to the PYTHONPATH env variable, as most of the launching methods allow, would take hold. When this fails, the first attempt at debugging would be to try it interactively with the same command, then promptly look at python --help when that fails.

Maybe a better question is why should the embeddable distribution's python.exe ignore env variables? Wouldn't it make more sense to depend on the user to add a -E if that is what they desire?

zooba commented 3 years ago

Updating the documentation link on the download page is being discussed as we speak.

One possible use-case is to package it along with another program to use the interpreter.

This is the primary use case. If you're doing something else with it, you're probably misusing it :)

In this case, the user may assume that adding something to the PYTHONPATH env variable, as most of the launching methods allow, would take hold.

Agreed. The documentation explains this, though likely doesn't make clear enough that it's the presence of the ._pth file that triggers the behaviour.

... then promptly look at python --help when that fails.

I'm pretty sure the help text is generated before we've tried to detect any local configuration, so it's far from trivial to make it dynamic based on context.

Maybe a better question is why should the embeddable distribution's python.exe ignore env variables? Wouldn't it make more sense to depend on the user to add a -E if that is what they desire?

It's to make it non-exploitable by default. The theory being that it will likely be installed into Program Files by an admin, which means file-based configuration is locked down from regular users and an attacker can't rely on a fully functioning Python runtime being present. Most people wildly underestimate how exploitable CPython is via environment variables.

In an embedded scenario, you also have other ways to update paths, either statically (in the ._pth file) or in Python code (via sys.path modification). And you can of course delete the ._pth file if you don't feel you need the isolation, but there are legitimate reasons we don't recommend that one.

Not enough of this is documented that well, unfortunately. It sounds like we should:

55c63543-9a1a-4828-9775-0cd4f61679d7 commented 3 years ago

A couple things...

> One possible use-case is to package it along with another program to use the interpreter.

This is the primary use case. If you're doing something else with it, you're probably misusing it :)

Interesting, I'd been expecting this was commonly used as the way to give access to python3X.dll. We actually do (or are trying to do) both from our installation.

I've been mostly focusing on PYTHONPATH because that's where I encountered the issue. Which if any of the other env variables are respected?

Would there be an argument to add additional command line options that could be used as a more secure alternative to the env variables? A command line argument -e that is the opposite of -E and enables the usage of PYTHON* env? Maybe this doesn't make sense since you said it is the ._pth that causes this...just thinking aloud.

The two options you mention (modify ._pth and append to sys.path) aren't great because we 1) would prefer to use the un-modified python distro 2) don't own the scripts that we are embedding, they are from a 3rd party so modifications are complicated.

zooba commented 3 years ago

I'd been expecting this was commonly used as the way to give access to python3X.dll.

Yeah, both. The idea is to delete the files you don't want - so if you don't need python.exe, just don't include it. Same goes for some of the native modules (e.g. deleting _socket.pyd and _ssl.pyd are an easy way to make sure you aren't offering networking).

I've been mostly focusing on PYTHONPATH because that's where I encountered the issue. Which if any of the other env variables are respected?

It's the equivalent of passing -I. That's the flag that gets set when a ._pth is detected: PC/getpathp.c#L563

Would there be an argument to add additional command line options that could be used as a more secure alternative to the env variables? ... Maybe this doesn't make sense since you said it is the ._pth that causes this...just thinking aloud.

Yeah, the ._pth file is the more secure alternative. To change the import search path, an attacker has to be able to modify a (likely admin-only) file on disk, rather than just launching an executable with a specific command line. (For a bit more context, many exploits try to schedule tasks, which allows arbitrary executable path and arguments. So anything resembling a security feature can't be allowed to be overridden by environment or arguments.)

The two options you mention (modify ._pth and append to sys.path) aren't great because we 1) would prefer to use the un-modified python distro 2) don't own the scripts that we are embedding, they are from a 3rd party so modifications are complicated.

None of the other options are better :)

Overriding the ._pth file should just be a matter of replacing the file. It's deliberately relative to its location, which means it should almost always be static. If you need your embedded interpreter to pick up paths from the local environment, just delete the file instead of updating it (which will make your copy of Python exploitable, but that's the tradeoff).

I don't know what your particular scripts look like, but I've had to go through and modify a number of third-party packages to make them work with this kind of setup. It's certainly possible to work around the limitation in a number of ways, often transparently to the code that's eventually going to be executed - the runpy module is often helpful. (And yeah, an attacker could do it as well, just not as trivially as it would be without the restriction.)