python / cpython

The Python programming language
https://www.python.org
Other
63.52k stars 30.43k forks source link

symlinking .py files creates unexpected sys.path #61839

Closed 84401114-8e59-4056-83cb-632106c0b648 closed 11 years ago

84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago
BPO 17639
Nosy @ncoghlan, @kristjanvalur, @ned-deily

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-bug', 'library'] title = 'symlinking .py files creates unexpected sys.path' updated_at = user = 'https://github.com/kristjanvalur' ``` bugs.python.org fields: ```python activity = actor = 'gvanrossum' assignee = 'none' closed = True closed_date = closer = 'gvanrossum' components = ['Interpreter Core', 'Library (Lib)'] creation = creator = 'kristjan.jonsson' dependencies = [] files = [] hgrepos = [] issue_num = 17639 keywords = [] message_count = 14.0 messages = ['186069', '186070', '186077', '186081', '186085', '186086', '186087', '186089', '186090', '186091', '186117', '357282', '357285', '357302'] nosy_count = 6.0 nosy_names = ['ncoghlan', 'kristjan.jonsson', 'schmir', 'ned.deily', 'neologix', 'Socob'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = None status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue17639' versions = ['Python 2.7', 'Python 3.3', 'Python 3.4'] ```

84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

When .py files are assembled into a directory structure using direct symbolic links to the files, something odd happens to sys.path[0].

Consider this file structure: /pystuff/ foo.py -> /scripts/foo.py bar.py -> /libs/bar.py

foo.py contains the line: "import bar" "python /pystuff/foo.py" will now fail, because when foo.py is run, sys.path[0] will contain "/scripts", rather than the expected "/pystuff".

It would appear that the algorithm for finding sys.path[0] is: sys.path[0] = os.dirname(os.realpath(filename)). IMO, it should be: sys.path[0] = os.realpath(os.dirname(filename)).

I say that this behaviour is unexpected, because symlinking to individual files normally has the semantics of "pulling that file in" rather than "hopping to that file's real dir".

As an example, the following works C, and other languages too, I should imagine: /code/ myfile.c -> /sources/myfile.c mylib.h -> /libs/mylib.h libmylib.so -> /libs/libmylib.so

an "#include "mylib.h" in myfile.c would look for the file in /code and find it. a "cc myfile.c -lmylib" would find the libmylib.so in /code

This problem was observed on linux, when running hadoop script jobs. The hadoop code (cloudera CDH4) creates a symlink copy of your file structure, where each file is individually symlinked to an place in a file cache, where each file may sit in a different physical dir, like this:

tmp1/ a.py -> /secret/filecache/0001/a.py b.py -> /secret/filecache/0002/b.py c.py -> /secret/filecache/0003/c.py

Suddenly, importing b and c from a.py won't work. if a, b, and c were .h files, then "#include "b.h"" from a.h would work.

84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

btw, this is the opposite issue to issue bpo-1387483

ncoghlan commented 11 years ago

Adding Guido & Ned, as my recollection is that some of the weirdness in the sys.path[0] symlink resolution was to placate the test suite on Mac OS X (at least, that was a cause of failures in the initial runpy module implementation until Guido tracked down the discrepancy in symlink resolution between direct script execution and runpy).

How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path?

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 11 years ago

How does the test suite react if you change the order of application to resolve symlinks only after dropping the file name from the path?

Note that this will break things, see e.g. http://bugs.python.org/issue1387483#msg186063

The only backward compatible way to handle this would be to add both directories to sys.path, hoping that there's no module with the same name in both directories.

gvanrossum commented 11 years ago

Do not "fix" this. It is an intentional feature.

There is a common pattern where one or more Python scripts are collected in some "bin" directory (presumably on the user's $PATH) as symlinks into the directory where they really live (not on $PATH, nor on sys.path). The other files needed by the script(s) are in the latter directory, and so it needs to be on sys.path[0]. If you change the symlink resolution, sys.path[0] will point to the "bin" directory and the scripts won't be able to find the rest of their modules.

While there are probably better patterns to solve the problem that this intends to solve, the pattern is commonly used and I do not want it to be broken.

If you are using symlinks for other purposes, well, too bad.

ncoghlan commented 11 years ago

I'll add it to the list of docs updates for post-PEP 432 (similar to the import system in general finally getting reference docs in 3.3 following the migration to importlib, I hope to have improved import state initialisation docs for 3.4 if I successfully tame the interpreter initialisation code)

84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

1) _I am not using symlinks this way. The hadoop scheduling processor is. This means that we cannot use Python for it withouth hacking the scripts for the special case. Presumably applications are not generally breaking when run in an artificial file tree populated with symlinked files into arbitrary real locations, but Python is. Only Python seems to care about the _real location of the file, as opposed to the apparent location. 2) This particular use case is quite unobvious, and goes against the spirit of symbolic links. They are meant to be transparent for applications. Consider e.g. the analogue to e.g. C header files. Only Python seems to care about the _real_ location of the file, as opposed to the apparent location. Effectively, Python is actively using the knowledge of these links as a sort of dynamic sys.path modifying tool.

I agree that it is not good to break existing usage, however misguided it may be. But in that case, isn't it possible to disable this symlink dereference via e.g. an option?

ncoghlan commented 11 years ago

Not currently, because interpreter startup is a mess already. Overriding sys.path[0] initialisation is on the list for 3.4 already, I'm just advising strongly against piling any more complexity on top of the current rickety structure until we do something about the foundation.

gvanrossum commented 11 years ago

I'm sure there's some change that can be made to the scripts that solves this locally, without requiring any changes to Python.

84401114-8e59-4056-83cb-632106c0b648 commented 11 years ago

Yes, of course. But I still maintain that the failure of python to work with a linktree of .py files, where the destination position of said links is arbitrary, is rather unusual, and IMHO violates the principle of least surprise. In this case, the existence of the virtual linktree is apparently an implementation detail of the hadoop implementation, not something that we as hadoop users were supposed to know or care about.

Exploiting the OS file system implementation detail of a symbolic link as a language import feature is an example of an unusual coupling indeed, in my opinion.

Even import-guru Nick didn't seem to be aware of this feature. It's great that we plan at least to document this unix-only feature at some point.

Cheers!

ncoghlan commented 11 years ago

The reason I haven't documented sys.path[0] initialisation is because I know I don't fully understand it. Path initialisation in general has a lot of historical quirks, particularly once symlinks are involved.

gvanrossum commented 4 years ago

It is quite intentional that symlinks are followed for the purpose of computing sys. argv[0] and sys.path. -- --Guido (mobile)

84401114-8e59-4056-83cb-632106c0b648 commented 4 years ago

So you have already stated, and this issue is six years old now.

While I no longer have a stake in this, I'd just like to reiterate that IMHO it breaks several good practices of architecture, particularly that of separation of roles.

The abstraction called symbolic links is the domain of the filesystem. An application should accept the image that the filesystem offers, not try to second-guess the intent of an operator by arbitrarily, and unexpectedly, unrolling that abstraction.

While you present a use case, I argue that it isn't, and shouldn't be, the domain of the application to intervene in an essentially shell specific, and operator specific process of collecting his favorite shortcuts in a folder. For that particular use case, a more sensible way would be for the user to simply create shell shortcuts, even aliases, for his favorite python scripts. This behaviour is basically taking over what should be the role of the shell. I'm unable to think of another program doing this sort of thin.

I suppose that now, with the reworked startup process, it would be simpler to actually document this rather unexpected behaviour, and possibly provide a flag to override it. I know that I some spent time on this and came away rather stumped.

gvanrossum commented 4 years ago

You have a point — I was just responding to Nick’s last message without noticing how old it was. I’ll remove myself from the nosy list.

On Fri, Nov 22, 2019 at 15:14 Kristján Valur Jónsson \report@bugs.python.org\ wrote:

Kristján Valur Jónsson \sweskman@gmail.com\ added the comment:

So you have already stated, and this issue is six years old now.

While I no longer have a stake in this, I'd just like to reiterate that IMHO it breaks several good practices of architecture, particularly that of separation of roles.

The abstraction called symbolic links is the domain of the filesystem. An application should accept the image that the filesystem offers, not try to second-guess the intent of an operator by arbitrarily, and unexpectedly, unrolling that abstraction.

While you present a use case, I argue that it isn't, and shouldn't be, the domain of the application to intervene in an essentially shell specific, and operator specific process of collecting his favorite shortcuts in a folder. For that particular use case, a more sensible way would be for the user to simply create shell shortcuts, even aliases, for his favorite python scripts. This behaviour is basically taking over what should be the role of the shell. I'm unable to think of another program doing this sort of thin.

I suppose that now, with the reworked startup process, it would be simpler to actually document this rather unexpected behaviour, and possibly provide a flag to override it. I know that I some spent time on this and came away rather stumped.

----------


Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue17639\


-- --Guido (mobile)