python / cpython

The Python programming language
https://www.python.org
Other
63.37k stars 30.33k forks source link

Concurrently closing files and iterating over the open files directory is not well specified #65826

Closed 8d8fabee-d1ef-4079-82b7-73db0595456d closed 3 years ago

8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago
BPO 21627
Nosy @gpshead, @pitrou, @vstinner, @serhiy-storchaka, @mr-nfamous, @iritkatriel
Files
  • cloexec.patch: The patch
  • fixed-setcloexec.patch: The fixed up patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/gpshead' closed_at = created_at = labels = ['invalid', 'type-bug', 'library', '3.9', '3.10'] title = 'Concurrently closing files and iterating over the open files directory is not well specified' updated_at = user = 'https://bugs.python.org/sstewartgallus' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'gregory.p.smith' closed = True closed_date = closer = 'vstinner' components = ['Library (Lib)'] creation = creator = 'sstewartgallus' dependencies = [] files = ['35441', '35600'] hgrepos = [] issue_num = 21627 keywords = ['patch'] message_count = 17.0 messages = ['219510', '219541', '219542', '219579', '219582', '219651', '219773', '219908', '219922', '220376', '382586', '382603', '382618', '382724', '402073', '402208', '402212'] nosy_count = 8.0 nosy_names = ['gregory.p.smith', 'pitrou', 'vstinner', 'neologix', 'serhiy.storchaka', 'sstewartgallus', 'bup', 'iritkatriel'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue21627' versions = ['Python 3.9', 'Python 3.10'] ```

    8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago

    Hello,

    I noticed some possible bad behaviour while working on Python issue 21618 (see http://bugs.python.org/issue21618). Python has the following code in _posixsubmodules.c for closing extra files before spawning a process:

    static void
    _close_open_fd_range_safe(int start_fd, int end_fd, PyObject* py_fds_to_keep)
    {
        int fd_dir_fd;
        if (start_fd >= end_fd)
            return;
    
        fd_dir_fd = _Py_open(FD_DIR, O_RDONLY);
        if (fd_dir_fd == -1) {
            /* No way to get a list of open fds. */
            _close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep);
            return;
        } else {
            char buffer[sizeof(struct linux_dirent64)];
            int bytes;
            while ((bytes = syscall(SYS_getdents64, fd_dir_fd,
                                    (struct linux_dirent64 *)buffer,
                                    sizeof(buffer))) > 0) {
                struct linux_dirent64 *entry;
                int offset;
                for (offset = 0; offset < bytes; offset += entry->d_reclen) {
                    int fd;
                    entry = (struct linux_dirent64 *)(buffer + offset);
                    if ((fd = _pos_int_from_ascii(entry->d_name)) < 0)
                        continue;  /* Not a number. */
                    if (fd != fd_dir_fd && fd >= start_fd && fd < end_fd &&
                        !_is_fd_in_sorted_fd_sequence(fd, py_fds_to_keep)) {
                        while (close(fd) < 0 && errno == EINTR);
                    }
                }
            }
            close(fd_dir_fd);
        }
    }

    In the code FD_DIR is "/proc/self/fd" on Linux. I'm not sure this code is correct. This seems as if it would have the same problems as iterating over a list and modifying it at the same time.

    I can think of a few solutions but they all have problems.

    One could allocate a list of open files once and then iterate through that list and close the files but this is not signal safe so this solution is incorrect. One possible workaround is to use mmap to allocate the memory. This is a direct system call and I don't see a reason for it not to be signal safe but I'm not sure. One neat hack would be too mmap the memory ahead of time and then rely on lazy paging to allocate the memory lazily. Another solution is to search for the largest open file and then iterate over and close all the possible file descriptors in between. So far most possible solutions just seem really hacky to me though.

    I feel the best solution is to let the OS close the files and to set the O_CLOEXEC flag on the files that need to be closed.

    8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago

    Okay, I've made a simple proof of concept patch.

    8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago

    Gah, I had trouble figuring out mecurial.

    Close files after reading open files and not concurrently

    This commit removes the old behaviour of closing files concurrently with reading the system's list of open files and instead closes the files after the list has been read.

    Because no memory can be allocated in that specific context the approach of setting the CLOEXEC flag and letting the following exec close the files has been used.

    For consistency, the brute force approach to closing the files has also been modified to set the CLOEXEC flag.

    vstinner commented 10 years ago

    I have no opinon on close() vs setting CLOEXEC flag, but you should use _Py_set_inheritable(fd, 0, NULL). _Py_set_inheritable() is able to set the CLOEXEC flag in a single syscall, instead of 2 syscalls. _close_fds_by_brute_force() is already very slow when the maximum file descriptor is large: http://legacy.python.org/dev/peps/pep-0446/#closing-all-open-file-descriptors

    vstinner commented 10 years ago

    Since Python 3.4, all file descriptors directly created by Python are marked as non-inheritable. It should now be safe to use subprocess with close_fds=False, but the default value was not changed because third party modules (especially code written in C) may not respect the PEP-446.

    If you are interested to change the default value, you should audit the major Python modules on PyPI to check if they were adapted to respect the PEP-446 (mark all file descriptors as non-inheritable).

    8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago

    It occurred to me that the current patch I have is wrong and that using _Py_set_inheritable is wrong because EBADF can occur in the brute force version which in the case of _Py_set_inheritable raises an error which I am not sure is asynch signal safe. I could test ahead of time but that is a bit hacky. The most principled approach would be to extract either a common set_cloexec or set_inheritable function. If set_inheritable is done then I would have to report either a Windows error or an errno error which would be messy. I'm not sure where the best place to put the set_cloexec function would be.

    8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago

    Okay, so the Python directory seems to be where wrappers over low level or missing functionality is placed. So I guess I should make a _Py_setcloexec function in a Python/setcloexec.c and a definition in Include/setcloexec.h?

    8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago

    Okay, now I'm confused. How would I conditionally compile and use the setcloexec object and header on POSIX platforms and not on Windows?

    vstinner commented 10 years ago

    The _posixsubprocess module is not compiled on Windows.

    8d8fabee-d1ef-4079-82b7-73db0595456d commented 10 years ago

    Okay, I made a patch that I hoped dealt with all the criticisms and that fixed up a problem I noted myself.

    iritkatriel commented 3 years ago

    I can't find _posixsubmodules.c or _close_open_fd_range_safe in the codebase. Is this issue out of date?

    serhiy-storchaka commented 3 years ago

    It is Modules/_posixsubprocess.c.

    We still do not have a reproducer, so I am not sure that the code has this problem.

    gpshead commented 3 years ago

    The Linux kernel code is not sufficiently easy to follow to understand if it has this issue or if it pre-creates the dirent structures for all fds at opendir time for /proc/self/fd or if it is iterating through the list of fds in sorted order so an older closed fd will not interfere with its internal iteration.

    Regardless, I've yet to knowingly witness a problem from this come up in practice. knowingly and yet being key words. :)

    But I like the general theme of your patch to set CLOEXEC on all of the fd's rather than explicitly call close(fd) in the directory reading loop.

    vstinner commented 3 years ago

    Linux 5.9 added close_range() syscall that we can use.

    Linux 5.10 may get a new CLOSE_RANGE_CLOEXEC flag for close_range(). https://lwn.net/Articles/837816/

    We may use close_range() rather than iterating on /proc/self/fd/ in user space. It may be faster and safer.

    df79943f-4aee-4531-a00d-c6b12816eb70 commented 3 years ago

    On Android, if os.close_range closes the file descriptor of the scandir iterator, the interpreter immediately crashes eg:

    >>> import os
    >>> os.scandir()
    <posix.ScandirIterator object at 0x7082d6ef10>
    >>> os.closerange(3,9999)
    fdsan: attempted to close file descriptor 3, expected to be unowned, actually owned by DIR* 0x7263390290
    Aborted
    $

    Sorry if this isn't an appropriate place to add this, but the title matched the concept I think and I don't know enough to judge whether this is something that needs to be fixed.

    vstinner commented 3 years ago

    Dan Snider: "On Android, if os.close_range closes the file descriptor of the scandir iterator, the interpreter immediately crashes"

    I don't see anything wrong with Python. It's not different than calling os.close(3).

    If it hurts, don't do that :-)

    vstinner commented 3 years ago

    I close the issue as "not a bug". If you disagree, please comment the issue :-)

    Steven Stewart-Gallus:

    In the code FD_DIR is "/proc/self/fd" on Linux. I'm not sure this code is correct. This seems as if it would have the same problems as iterating over a list and modifying it at the same time.

    I don't understand the initial issue. What makes you think that the current code is not reliable? Python is using this code for years, and so far, nobody reported any crash on this code.

    You didn't report a crash, you only wrote that you suspect that there is a bug. Can you prove that the current Python code is wrong? Did you look at the implementation of readdir() in the Linux glibc for example?

    About close() vs marking file descriptor non inheritable, I'm not convinced that marking file descriptors is faster or more reliable. On many operating systems, _Py_set_inheritable() requires 2 syscalls (get flags, set flags), whereas close() is a single syscall: calling close() is faster.

    ---

    Python now has a _Py_closerange() function which supports:

    On Linux, _posixsubprocess still iterates /proc/self/pid/ if this directory can be opened, even if the closerange() function is available. I'm not sure which option is the safest or the fastest.

    By the way, Steven Stewart-Gallus did mention his operating system.