python / cpython

The Python programming language
https://www.python.org
Other
63.22k stars 30.27k forks source link

__code__. co_filename should always be an absolute path #64642

Closed 1st1 closed 4 years ago

1st1 commented 10 years ago
BPO 20443
Nosy @gpshead, @ncoghlan, @vstinner, @bitdancer, @ambv, @ammaraskar, @tirkarthi, @isidentical
PRs
  • python/cpython#13527
  • python/cpython#14053
  • python/cpython#14446
  • python/cpython#17534
  • python/cpython#17986
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', '3.9'] title = '__code__. co_filename should always be an absolute path' updated_at = user = 'https://github.com/1st1' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'ncoghlan' components = [] creation = creator = 'yselivanov' dependencies = [] files = [] hgrepos = [] issue_num = 20443 keywords = ['patch'] message_count = 32.0 messages = ['209699', '345501', '345521', '345585', '346207', '346262', '346524', '346789', '346800', '346823', '354925', '354999', '355055', '355085', '355088', '355104', '355105', '355106', '355218', '355219', '355278', '355334', '355335', '355339', '355340', '358094', '358114', '358116', '358128', '358661', '359098', '359902'] nosy_count = 9.0 nosy_names = ['gregory.p.smith', 'ncoghlan', 'vstinner', 'r.david.murray', 'lukasz.langa', 'Michel Desmoulin', 'ammar2', 'xtreak', 'BTaskaya'] pr_nums = ['13527', '14053', '14446', '17534', '17986'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue20443' versions = ['Python 3.9'] ```

    1st1 commented 10 years ago

    There are many issues on tracker related to the relative paths in co_filename. Most of them are about introspection functions in inspect module--which are usually broken after 'os.chdir'.

    Test case: create a file t.py:

       def foo(): pass
       print(foo.__code__.co_filename)

    Execute it with '$ python t.py' -> it will print 't.py'.

    Ideally, when executing a python file, interpreter should expand all relative paths for __file and __code.co_filename attributes.

    vstinner commented 5 years ago

    PR 14053 is a different approach than PR 13527: compute the absolute path to the script filename in PyConfig_Read() just after parsing the command line.

    vstinner commented 5 years ago

    One of the side effect of my PR 14053 is that tracebacks are more verbose: the filename is now absolute rather than relative.

    Currently:

    $ python3 x.py 
    Traceback (most recent call last):
      File "x.py", line 4, in <module>
        func()
      File "x.py", line 2, in func
        bug
    NameError: name 'bug' is not defined

    With my PR:

    $ ./python x.py 
    Traceback (most recent call last):
      File "/home/vstinner/prog/python/master/x.py", line 4, in <module>
        func()
      File "/home/vstinner/prog/python/master/x.py", line 2, in func
        bug
    NameError: name 'bug' is not defined
    vstinner commented 5 years ago

    The site module tries to compute the absolute path of __file and __cached attributes of all modules in sys.modules:

    def abs_paths():
        """Set all module __file__ and __cached__ attributes to an absolute path"""
        for m in set(sys.modules.values()):
            if (getattr(getattr(m, '__loader__', None), '__module__', None) not in
                    ('_frozen_importlib', '_frozen_importlib_external')):
                continue   # don't mess with a PEP 302-supplied __file__
            try:
                m.__file__ = os.path.abspath(m.__file__)
            except (AttributeError, OSError, TypeError):
                pass
            try:
                m.__cached__ = os.path.abspath(m.__cached__)
            except (AttributeError, OSError, TypeError):
                pass

    The __path__ attribute isn't updated.

    Another approach would be to hack importlib to compute the absolute path before loading a module, rather than trying to fix it *afterwards*.

    One pratical problem: posixpath and ntpath are not available when importlib is setup, these modules are implemented in pure Python and so must be imported.

    Maybe importlib could use a naive implementation of os.path.abspath(). Maybe the C function _Py_abspath() that I implemented in PR 14053 should be exposed somehow to importlib as a private function using a builtin module like _imp, so it can be used directly.

    ncoghlan commented 5 years ago

    Perhaps os._abspath_for_import? If importlib finds it, then it can handle making early paths absolute itself, otherwise it will defer doing that until it has the full external import system bootstrapped. (importlib does try to make all paths absolute as it loads modules - it's just that some early modules aren't loaded that way)

    vstinner commented 5 years ago

    Effect of my PR 14053 using script.py:

    import sys
    print(f"{__file__=}")
    print(f"{sys.argv=}")
    print(f"{sys.path[0]=}")

    Before:

    $ ./python script.py 
    __file__=script.py
    sys.argv=['script.py']
    sys.path[0]=/home/vstinner/prog/python/master

    With the change:

    $ ./python script.py 
    __file__=/home/vstinner/prog/python/master/script.py
    sys.argv=['/home/vstinner/prog/python/master/script.py']
    sys.path[0]=/home/vstinner/prog/python/master
    vstinner commented 5 years ago

    New changeset 3939c321c90283b49eddde762656e4b1940e7150 by Victor Stinner in branch 'master': bpo-20443: _PyConfig_Read() gets the absolute path of run_filename (GH-14053) https://github.com/python/cpython/commit/3939c321c90283b49eddde762656e4b1940e7150

    vstinner commented 5 years ago

    Example of case where a module path is still relative: ---

    import sys
    import os
    modname = 'relpath'
    filename = modname + '.py'
    sys.path.insert(0, os.curdir)
    with open(filename, "w") as fp:
        print("import sys", file=fp)
        print("mod = sys.modules[__name__]", file=fp)
        print("print(f'{__file__=}')", file=fp)
        print("print(f'{mod.__file__=}')", file=fp)
        print("print(f'{mod.__cached__=}')", file=fp)
    __import__(modname)
    os.unlink(filename)

    Output: --- __file='./relpath.py' mod.__file='./relpath.py' mod.__cached='./pycache__/relpath.cpython-39.pyc' ---

    __file and mod.__file are relative paths: not absolute paths.

    tirkarthi commented 5 years ago

    This change seems to produce a new warning on my Mac.

    ./Modules/getpath.c:764:23: warning: incompatible pointer types passing 'char [1025]' to parameter of type 'const wchar_t *' (aka 'const int *') [-Wincompatible-pointer-types]
                _Py_isabs(execpath))
                          ^~~~~~~~
    ./Include/fileutils.h:158:42: note: passing argument to parameter 'path' here
    PyAPI_FUNC(int) _Py_isabs(const wchar_t *path);
                                             ^
    1 warning generated.
    vstinner commented 5 years ago

    New changeset 3029035ef34c9bae0c8d965290cd9b273c8de1ea by Victor Stinner in branch 'master': bpo-20443: Fix calculate_program_full_path() warning (GH-14446) https://github.com/python/cpython/commit/3029035ef34c9bae0c8d965290cd9b273c8de1ea

    126e97a7-8577-4d0e-8ee6-8324ba2d3aec commented 5 years ago

    Isn't that change breaking compat ?

    I'm assuming many scripts in the wild sys.argv[0] and play with it assuming it's relative.

    I would expect such a change to be behind a flag or a __future__ import.

    ncoghlan commented 5 years ago

    I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

    That's straightforward to selectively revert, though: remove the "config->run_filename != NULL" clause at https://github.com/python/cpython/blob/24dc2f8c56697f9ee51a4887cf0814b6600c1815/Python/initconfig.c#L2201 and add a comment with the reason for the deliberate omission.

    That way the OS-provided argv entry will continue to be passed through to sys.argv[0], while everywhere else will get the absolute path.

    vstinner commented 5 years ago

    What is the use case for having a relative sys.argv[0]?

    Isn't that change breaking compat ?

    Right. It has been made on purpose. The rationale can be found in the first message of this issue.

    126e97a7-8577-4d0e-8ee6-8324ba2d3aec commented 5 years ago

    The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did.

    3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

    Do we really want to make a habit of breaking a few things at every minor release ?

    It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

    I'm advocating that any breaking change,, even the tiniest, should be behind a __future__ import or a flag, with warnings, and then switched on for good on Python 4.0.

    Python is a language, not a simple library. Stability is key. Like Linus Torvald use to say: "do not break user space"

    The fact this change is easy to fix and migrate is not a proper reason to ignore this important software rule, IMO. So many people and so much code rely on Python that the tinest glitch may have massive impact down the line.

    Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

    Le 21/10/2019 à 13:26, STINNER Victor a écrit :

    STINNER Victor \vstinner@python.org\ added the comment:

    What is the use case for having a relative sys.argv[0]?

    > Isn't that change breaking compat ?

    Right. It has been made on purpose. The rationale can be found in the first message of this issue.

    ----------


    Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue20443\


    ammaraskar commented 5 years ago

    I did a quick search to see what code would break from sys.argv[0] going relative

    intext:"sys.argv[0]" ext:py site:github.com https://www.google.com/search?q=intext:"sys.argv%5B0%5D"+ext:py+site:github.com

    and while most uses of it are to print usage messages. There is potentially code relying on it being a relative path that will break from this change:

    I think Michel and Nick are correct here and the status-quo should be maintained for sys.argv[0]. Michel should not have to enumerate the use cases for a relative sys.argv[0].

    vstinner commented 5 years ago

    Nick Coghlan:

    I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

    Even before this cases, there were different cases where Python does modify sys.argv:

    At the C level, the Py_GetArgcArgv() function provides the "original" argc and argv: before they are modified.

    See open issues:

    vstinner commented 5 years ago

    I understand that the discussion is about my commit 3939c321c90283b49eddde762656e4b1940e7150 which changed multiple things: --- Python now gets the absolute path of the script filename specified on the command line (ex: python3 script.py): the __file__ attribute of the __main__ module, sys.argv[0] and sys.path[0] become an absolute path, rather than a relative path. ---

    I understand that the complain is not about sys.modules['__main'].__file nor sys.path[0], but only sys.argv[0].

    The sys.argv documentation says:

    "The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not)."

    https://docs.python.org/dev/library/sys.html#sys.argv

    I understand that before my change, sys.argv[0] was sometimes relative, and sometimes absolute. With my change, sys.argv[0] should now always be asolute.

    There are cases where an absolute path is preferred. There are cases where a relative path is preferred. I'm trying to understand the scope of the issue.

    https://github.com/google/python-gflags/blob/4f06c3d0d6cbe9b1fb90ee9fb1c082b3bf9285f6/gflags/flagvalues.py#L868-L869

    I don't know this code. It seems like sys.argv[0] is used to lookup in a dict, and that dict keys are supposed to be "module names". I don't understand in which cases sys.argv[0] could be a module *name*, since sys.argv[0] seems like a relative or absolute path.

    https://github.com/HcashOrg/hcOmniEngine/blob/f1acc2ba3640a8e1c651ddc90a86d569d00704fe/msc-cli.py#L12

    The code starts by sys.argv.pop(0): remove sys.argv[0].

    https://github.com/vmtk/vmtk/blob/64675f598e31bc6be3d4fba903fb59bf1394f492/PypeS/pyperun.py#L35

    if sys.argv[0].startswith('pyperun'): ...

    I understand that my change broke such test. Did this test work before my change when specifying the absolute path to run the script? Like path/to/pyperun.py rather than pyperun.py? Such code looks fragile: using os.path.basename() would be safer here.

    https://github.com/apache/lucene-solr/blob/cfa49401671b5f9958d46c04120df8c7e3f358be/dev-tools/scripts/svnBranchToGit.py#L783

      home = os.path.expanduser("~")
      svnWorkingCopiesPath = os.path.join(home, "svnwork")
      gitReposPath = os.path.join(home, "gitrepos")
      if sys.argv[0].startswith(svnWorkingCopiesPath) 
         or sys.argv[0].startswith(gitReposPath): ...

    On my Linux, os.path.expanduser("\~") is an absolute path and so svnWorkingCopiesPath and gitReposPath should be absolute paths, no?

    My change made this code more reliable, rather than breaking it, no?

    vstinner commented 5 years ago

    Michel Desmoulin: "The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did."

    You have to understand that the change has be done to fix some use cases and that Python 3.8.0 has been related with the change.

    If we change again the behavior, we will get Python \< 3.8.0, Python 3.8.0 and Python > 3.8.0 behaving differently. I would prefer to fully understand all use cases before starting to discuss changing the behavior again.

    3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

    You have to understand that it was known since the start and it was a deliberate choice after balancing advantages and disadvantages.

    https://www.python.org/dev/peps/pep-0492/#transition-plan

    I'm only aware of a single function in a single project, Twisted. Are you aware of more projects broken by this change? Did it take long to fix them? Balance it with the ability to use async and await in cases which were not possible in Python 3.6 because they were not real keywords. The Python 3.6 grammar was fully of hack to emulate keywords.

    Do we really want to make a habit of breaking a few things at every minor release ?

    Usually, incompatible changes are discussed to balance if it's worth it or not.

    There is no plan to break all user code just for your pleasure.

    For the specific case of argv[0], the discussion occurred here on associated pull requests, and nobody mentioned any risk of backward compatibility.

    FYI I'm working on this topic and I just proposed a the PEP-606 "Python Compatibility Version" to propose one possible solution to make incompatible changes less painful to users.

    It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

    My PEP-606 tries to explain incompatible changes are needed: https://www.python.org/dev/peps/pep-0606/#the-need-to-evolve-frequently

    Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

    I understand that but such comment doesn't help the discussion. See my previous comments and please try to help to fill the missing information to help us to take better decisions.

    Since Python 3.8.0 has been release, a revert can make things even worse that the current state.

    ncoghlan commented 5 years ago

    This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

    Unless it was backported in a Python 3.8 PR that didn't link back to this issue or the original Python 3.9 PR?

    vstinner commented 5 years ago

    This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

    Oh, you're right. Sorry, I thought that I made the change before 3.8 branch was created. In that case, we can do whatever we want :-) (We are free to revert.)

    gpshead commented 5 years ago

    Please revert. An absolute path changes semantics in many real world situations (altering symlink traversals, etc). People expect the current sys.argv[0] behavior which is "similar to C argv" and matches what was passed on the interpreter command line.

    A getcwd() call doesn't even have to succeed. A single file python program should still be able to run in that environment rather than fail to start.

    To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename. Allow it to be '' if getcwd failed at source load time. Code could check if co_cwd exists and join it with the co_filename. Also let co_cwd remain empty when there is no valid co_filename for the code.

    Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state. That call is best avoided as modifying globals is impolite.

    vstinner commented 5 years ago

    Please revert.

    Ok ok, I will revert my change, but only on sys.argv[0].

    A getcwd() call doesn't even have to succeed. A single file python program should still be able to run in that environment rather than fail to start.

    The current implementation leaves a path unchanged if it fails to make it absolute:

    config_run_filename_abspath:

        wchar_t *abs_filename;
        if (_Py_abspath(config->run_filename, &abs_filename) < 0) {
            /* failed to get the absolute path of the command line filename:
               ignore the error, keep the relative path */
            return _PyStatus_OK();
        }
        if (abs_filename == NULL) {
            return _PyStatus_NO_MEMORY();
        }
    
        PyMem_RawFree(config->run_filename);
        config->run_filename = abs_filename;

    with:

    /* Get an absolute path.
       On error (ex: fail to get the current directory), return -1.
       On memory allocation failure, set *abspath_p to NULL and return 0.
       On success, return a newly allocated to *abspath_p to and return 0.
       The string must be freed by PyMem_RawFree(). */
    int _Py_abspath(const wchar_t *path, wchar_t **abspath_p);
    vstinner commented 5 years ago

    To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename. Allow it to be '' if getcwd failed at source load time. Code could check if co_cwd exists and join it with the co_filename. Also let co_cwd remain empty when there is no valid co_filename for the code.

    Do you mean that co_filename attribute of code objects must remain relative?

    Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state. That call is best avoided as modifying globals is impolite.

    I thought that calling os.chdir("/") is a good practice when launching a program as a daemon. And thne call os.fork() twice to detach the process from its parent and run it in background.

    I commonly use os.chdir() in my programs for various reasons. A classical use case is to mimick a shell script doing something like (cd $DIRECTORY; command): temporary change the current directory.

    cwd parameter of subprocess.Popen helps to avoid changing the currently directory, but sometimes subprocess is not used at all.

    Maybe avoiding os.chdir() is a good practice, but it's hard to prevent users to call it. I don't see how os.chdir() is bad by design. I'm not sure that it's really useful to debate this neither :-)

    gpshead commented 5 years ago

    I think sys.argv[0] is the important one as program logic often tends to use that as an input.

    I'm honestly unsure of if this will be as much of a problem for .co_filename or sys.path[0].

    gpshead commented 5 years ago

    (read that as: feel free to keep the behavior for co_filename and path[0] and lets see what happens :)

    ambv commented 4 years ago

    Anything new here? 3.9.0a2 is due next week.

    vstinner commented 4 years ago

    Anything new here? 3.9.0a2 is due next week.

    I had it in my TODO list but I lost the bpo number. Thanks for the reminder :-) I wrote PR 17534 that I plan to merge as soon as the CI tests pass.

    vstinner commented 4 years ago

    New changeset a1a99b4bb7cbe2dbc55a1d92c3c509b4466d3c3b by Victor Stinner in branch 'master': bpo-20443: No longer make sys.argv[0] absolute for script (GH-17534) https://github.com/python/cpython/commit/a1a99b4bb7cbe2dbc55a1d92c3c509b4466d3c3b

    isidentical commented 4 years ago

    I am not sure that if co_filename can break backwards compability or not (because until CodeType.replace; it was usual to break code object construction, people who are using code objects may take this too) but yes PR 17534 was necessary.

    My initial patch (PR 13527) proposed a future directive to do this (and poorly implemented the concept) which can be helpful (yet it needs a PEP).

    vstinner commented 4 years ago

    I reverted my change, so I remove "release blocker" priority.

    ncoghlan commented 4 years ago

    With the sys.argv[0] change reverted, I think this overall issue is fixed now - code objects will get absolute paths, while sys.argv[0] will continue to reflect how __main__ was identified.

    vstinner commented 4 years ago

    New changeset c1ee6e5e9b87c9812c6745c1dd6c1788a984f9f9 by Victor Stinner in branch 'master': bpo-20443: Update What's New In Python 3.9 (GH-17986) https://github.com/python/cpython/commit/c1ee6e5e9b87c9812c6745c1dd6c1788a984f9f9

    bvd0 commented 1 year ago

    This says that Module __file__ attributes (and related values) should now always contain absolute paths by default, with the sole exception of __main__.__file__, and this says that the __file__ attribute of the __main__ module became an absolute path; does this mean that __file__ is always an absolute path?

    Is there anything in the official Python documentation that specifies whether __file__ is absolute or relative?

    vstinner commented 1 year ago

    This issue is closed. I suggest you to open a new issue.