python / cpython

The Python programming language
https://www.python.org
Other
62.84k stars 30.09k forks source link

Python crash on macOS when CWD is invalid #80417

Closed 80d0d833-181a-4df0-a469-6e7405b7821b closed 5 years ago

80d0d833-181a-4df0-a469-6e7405b7821b commented 5 years ago
BPO 36236
Nosy @ronaldoussoren, @ncoghlan, @vstinner, @ned-deily, @pablogsal
PRs
  • python/cpython#12237
  • python/cpython#12424
  • python/cpython#12441
  • python/cpython#12450
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['OS-mac', '3.7', '3.8'] title = 'Python crash on macOS when CWD is invalid' updated_at = user = 'https://bugs.python.org/lkollar' ``` bugs.python.org fields: ```python activity = actor = 'ncoghlan' assignee = 'none' closed = True closed_date = closer = 'ned.deily' components = ['macOS'] creation = creator = 'lkollar' dependencies = [] files = [] hgrepos = [] issue_num = 36236 keywords = ['patch'] message_count = 19.0 messages = ['337469', '337470', '337767', '338116', '338209', '338297', '338300', '338301', '338362', '338363', '338365', '338392', '338393', '338394', '338417', '338418', '338746', '338754', '339195'] nosy_count = 6.0 nosy_names = ['ronaldoussoren', 'ncoghlan', 'vstinner', 'ned.deily', 'lkollar', 'pablogsal'] pr_nums = ['12237', '12424', '12441', '12450'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue36236' versions = ['Python 3.7', 'Python 3.8'] ```

    80d0d833-181a-4df0-a469-6e7405b7821b commented 5 years ago

    CPython crashes with "pymain_compute_path0: memory allocation failed" when attempting to execute it with a library module from a previously deleted directory.

    To reproduce:

    cd \~/tmp/python_crash

    rm -rf \~/tmp/python_crash

    python3.7 -m pdb Fatal Python error: pymain_compute_path0: memory allocation failed ValueError: character U+ef3a8fe0 is not in range [U+0000; U+10ffff]

    Current thread 0x000000011060e5c0 (most recent call first):

    pablogsal commented 5 years ago

    This is happening because _Py_wgetcwd returns NULL (although the underliying getcwd system call populates the fullpath variable correctly) but its caller, _PyPathConfig_ComputeArgv0, does not check the return value:

            _Py_wgetcwd(fullpath, Py_ARRAY_LENGTH(fullpath));
             argv0 = fullpath;

    and its caller, pymain_run_python, interprets a failure in _PyPathConfig_ComputeArgv0 as a memory problem:

             PyObject *path0 = _PyPathConfig_ComputeArgv0(config->argc,
                                                          config->argv);
             if (path0 == NULL) {
                 err = _Py_INIT_NO_MEMORY();
                 goto done;
             }
    vstinner commented 5 years ago

    _Py_wgetcwd() call has been introduced by the following commit:

    commit ee3784594b33c72c3fdca6a71892d22f14045ab6 Author: Nick Coghlan \ncoghlan@gmail.com\ Date: Sun Mar 25 23:43:50 2018 +1000

    bpo-33053: -m now adds *starting* directory to sys.path (GH-6231) (bpo-6236)
    
    Historically, -m added the empty string as sys.path
    zero, meaning it resolved imports against the current
    working directory, the same way -c and the interactive
    prompt do.
    
    This changes the sys.path initialisation to add the
    *starting* working directory as sys.path[0] instead,
    such that changes to the working directory while the
    program is running will have no effect on imports
    when using the -m switch.
    
    (cherry picked from commit d5d9e02dd3c6df06a8dd9ce75ee9b52976420a8b)

    I don't think that it's correct to fail with a fatal error if the current directory no longer exist. Would it be possible to not add it to sys.path if it doesn't exist? Silently ignore the error.

    @Nick: What do you think?

    ncoghlan commented 5 years ago

    Omitting it from sys.path in that case makes sense to me, but I'm not sure what sys.argv[0] should be set to.

    vstinner commented 5 years ago

    Omitting it from sys.path in that case makes sense to me, but I'm not sure what sys.argv[0] should be set to.

    I propose to use ".". It would be consistent with platforms which doesn't implement realpath:

        if (have_module_arg) {
            #if defined(HAVE_REALPATH) || defined(MS_WINDOWS)
                _Py_wgetcwd(fullpath, Py_ARRAY_LENGTH(fullpath));
                argv0 = fullpath;
                n = wcslen(argv0);
            #else
                argv0 = L".";
                n = 1;
            #endif
        }

    And it defers the error handler to later. Example of Python 3 running in a removed directory:

    $ python3
    Python 3.7.2 (default, Jan 16 2019, 19:49:22) 
    >>> import os
    >>> os.getcwd()
    FileNotFoundError: [Errno 2] No such file or directory
    
    >>> os.path.abspath('.')
    Traceback (most recent call last):
      File "/usr/lib64/python3.7/posixpath.py", line 383, in abspath
        cwd = os.getcwd()
    FileNotFoundError: [Errno 2] No such file or directory

    I would prefer "." than "-m".

    pablogsal commented 5 years ago

    If we set argv0 to ".":

                 if (_Py_wgetcwd(fullpath, Py_ARRAY_LENGTH(fullpath))) {
                    argv0 = fullpath;
                    n = wcslen(argv0);
                 }
                 else {
                    argv0 = L".";
                    n = 1;
                 }

    then the caller does not have any way of knowing if the returned argv0 is due to the fact that _Py_wgetcwd failed so it will blindly add "." to sys.path unless we set the result using PyObject** and then returning some error code to signal what happened (or something similar). On the other hand, I do not like this API, because returning some error code != 0 from _PyPathConfig_ComputeArgv0 would be weird because the call actually succeeded (it has returned "." as the value for argv0).

    What do you think is the best way to signal pymain_run_python that the current directory does not exist (because _Py_wgetcwd has failed)?

    vstinner commented 5 years ago

    Oh. It seems like I misunderstood the issue. I read "argv0" as sys.argv[0], but _PyPathConfig_ComputeArgv0 is used to insert a value at the start of sys.path. That's different...

    If the current directory has been removed, sys.path should just be left unchanged, no?

    pablogsal commented 5 years ago

    Yup. But what is the best way to signal the caller that _PyPathConfig_ComputeArgv0 has failed because _Py_wgetcwd has failed without changing the API massively?

    Right now if _PyPathConfig_ComputeArgv0 returns null is assumed that is due to a memory error when calling PyUnicode_FromWideChar. So either we stop returning _Py_INIT_NO_MEMORY() and then skip appending to sys_path or we change the API to signal different problems to the caller.

    Also, notice that the same function is used in sysmodule.c in PySys_SetArgvEx:

    If argv[0] is not '-c' nor '-m', prepend argv[0] to sys.path.

    vstinner commented 5 years ago

    New changeset dcf617152e1d4c4a5e7965733928858a9c0936ca by Victor Stinner in branch 'master': bpo-36236: Handle removed cwd at Python init (GH-12424) https://github.com/python/cpython/commit/dcf617152e1d4c4a5e7965733928858a9c0936ca

    vstinner commented 5 years ago

    Can someone please on macOS to confirm that the bug is fixed?

    pablogsal commented 5 years ago

    It seems to work:

    ❯ uname -a Darwin C02VL073HTDG 18.2.0 Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1\~1/RELEASE_X86_64 x86_64

    ❯ pwd /tmp/check

    /tmp/check ❯ rm -rf ../check

    /tmp/check ❯ \~/github/cpython/python.exe -m pdb usage: pdb.py [-c command] ... [-m module | pyfile] [arg] ...

    Debug the Python program given by pyfile. Alternatively, an executable module or package to debug can be specified using the -m switch.

    Initial commands are read from .pdbrc files in your home directory and in the current directory, if they exist. Commands supplied with -c are executed after commands from .pdbrc files.

    To let the script run until an exception occurs, use "-c continue". To let the script run up to a given line X in the debugged file, use "-c 'until X'".

    vstinner commented 5 years ago

    New changeset fc96e5474a7bda1c5dec66420e4467fc9f7ca968 by Victor Stinner in branch 'master': bpo-36236: Fix _PyPathConfig_ComputeSysPath0() for empty argv (GH-12441) https://github.com/python/cpython/commit/fc96e5474a7bda1c5dec66420e4467fc9f7ca968

    vstinner commented 5 years ago

    Pablo: is Python 3.7 affected by the issue?

    pablogsal commented 5 years ago

    Yup:

    ~ ❯ cd /tmp

    /tmp ❯ mkdir check

    /tmp ❯ cd check

    /tmp/check ❯ rm -rf ../check

    /tmp/check ❯ python3.7 -m pdb Fatal Python error: pymain_compute_path0: memory allocation failed ValueError: character U+e0895f00 is not in range [U+0000; U+10ffff]

    Current thread 0x000000011d9405c0 (most recent call first):

    vstinner commented 5 years ago

    New changeset f7959a9fe7e7e316899c60251e29390c4ed0307a by Victor Stinner in branch '3.7': bpo-36236: Handle removed cwd at Python init (GH-12450) https://github.com/python/cpython/commit/f7959a9fe7e7e316899c60251e29390c4ed0307a

    vstinner commented 5 years ago

    Pablo: Oh ok, thanks for testing. I fixed 3.7 as well. Would you mind to validate my fix?

    ned-deily commented 5 years ago

    The fix for 3.7 works too: no more crashing. Thanks, everyone!

    vstinner commented 5 years ago

    The fix for 3.7 works too: no more crashing. Thanks, everyone!

    I planned to test, but I had no access to macOS last years. Thanks for testing Ned. Good to hear that the bug is now fixed in 3.7 and master.

    ncoghlan commented 5 years ago

    Thanks for sorting this out, Victor!