python / cpython

The Python programming language
https://www.python.org
Other
63.14k stars 30.23k forks source link

Leading or trailing colon in PYTHONPATH adds cwd to sys.path #107353

Open vyasr opened 1 year ago

vyasr commented 1 year ago

Documentation

When running a Python script, the sys.path documentation indicates that the script's path is added to the sys.path. This path may be augmented with PYTHONPATH (also documented). However, PYTHONPATH has surprising behavior when a leading or trailing colon is present: the current directory is added, even if that directory is not the directory the script is executed from.

Consider the following folder structure:

% tree pythonpath_test 
pythonpath_test
├── mod
│   └── __main__.py
└── script.py

Where both script.pyand __main__.py are identically:

import sys
print(sys.path)

The outputs are different when run with an empty PYTHONPATH as compared to a PYTHONPATH containing a colon:

% PYTHONPATH='' python3 pythonpath_test/script.py
['/home/nfs/vyasr/pythonpath_test', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages']
%
% PYTHONPATH=':' python3 pythonpath_test/script.py
['/home/nfs/vyasr/pythonpath_test', '/home/nfs/vyasr', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages']

In the second case I see an additional directory /home/nfs/vyasr, the directory from which I execute the script.

The same behavior is also present for the module

% PYTHONPATH='' python3 pythonpath_test/mod 
['pythonpath_test/mod', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages']
%
% PYTHONPATH=':' python3 pythonpath_test/mod 
['pythonpath_test/mod', '/home/nfs/vyasr', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages']

This behavior is not documented anywhere that I can find.

I have opened this as a documentation issue, but if it is unintended behavior perhaps it is a bug that should be fixed as well.

eryksun commented 1 year ago

Not that it's sufficient, but probably it's undocumented because it's a well-known behavior for the POSIX PATH environment variable, and it's documented that the "format is the same as the shell’s PATH". Here's the POSIX spec for PATH:

This variable shall represent the sequence of path prefixes that certain functions and utilities apply in searching for an executable file known only by a filename. The prefixes shall be separated by a \<colon> ( ':' ). When a non-zero-length prefix is applied to this filename, a \<slash> shall be inserted between the prefix and the filename if the prefix did not end in \<slash>. A zero-length prefix is a legacy feature that indicates the current working directory. It appears as two adjacent \<colon> characters ( "::" ), as an initial preceding the rest of the list, or as a trailing \<colon> following the rest of the list. A strictly conforming application shall use an actual pathname (such as .) to represent the current working directory in PATH. The list shall be searched from beginning to end, applying the filename to each prefix, until an executable file with the specified name and appropriate execution permissions is found. If the pathname being sought contains a \<slash>, the search through the path prefixes shall not be performed. If the pathname begins with a \<slash>, the specified path is resolved (see Pathname Resolution). If PATH is unset or is set to null, the path search is implementation-defined.

Since \<colon> is a separator in this context, directory names that might be used in PATH should not include a \<colon> character.

Also, for sys.path it's documented that an empty string "means the current working directory", but it's kind of buried.

I agree that the behavior of an empty string in PYTHONPATH and sys.path should be documented more clearly. It should also be noted that an empty string is resolved to whatever the current directory happens to be when sys.path is searched, and the current working directory can change during the execution of the script.

ericw-bright commented 1 year ago

sys.path may document an empty string but PYTHONPATH is expanded before it's added to sys.path. The PYTHONPATH documentation also indicates that "Non-existent directories are silently ignored." Side note as PYTHONPATH can contain .zip files the use of the term "Non-existent directories" is a slight misnomer. The use of abspath in the conversion from environment variable to sys.path appears to be the cause of the expansion from '' to '.' and then to the actual path of the execution.

https://github.com/python/cpython/blob/983305268e2291b0a7835621b81bf40cba7c27f3/Modules/getpath.py#L658-L660

>>> import os.path
>>> os.path.exists('')
False
>>> os.path.abspath('')
'/home/eric'
>>> os.path.abspath('') == os.path.abspath('.')
True
>>> os.path.samefile('','.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/cm/local/apps/python3/lib/python3.9/genericpath.py", line 100, in samefile
    s1 = os.stat(f1)
FileNotFoundError: [Errno 2] No such file or directory: ''
>>> os.path.abspath('.ssh')
'/home/eric/.ssh'
>>> os.path.exists('.ssh')
True
>>> os.path.samefile('.ssh','./.ssh')
True
>>> os.path.normpath('')
'.'

So much of this confusion seems to stem from the use of abspath without first checking to see if the path exists. abspath relies on normpath which presumes and introduced the cwd ..

So it seems like the conversion from the environment variable PYHTONPATH should either confirm os.path.exists or not use abspath.

eryksun commented 1 year ago

@ericw-bright, to me it seems like a bug that an empty string in PYTHONPATH gets expanded to the initial current working directory. I think normpath() should be used instead of abspath(). This would normalize an empty string to ".", which behaves the same as an empty string. The behavior of an empty string in sys.path is intentional and cannot change.

The documentation of PYTHONPATH should be minimal in terms of describing the format and when and how it modifies sys.path. It's inappropriate to discuss the content of the default search path, support for zip files, or the search behavior for sys.path. Leave that to the documentation of sys.path, which itself should link to other places in the documentation where the search behavior and default module search path are discussed in detail. This information should not be repeated piecemeal throughout the documentation.