python / cpython

The Python programming language
https://www.python.org
Other
63.41k stars 30.36k forks source link

Python should support exporting thread names to the OS #59705

Open f7ea1b17-2187-492a-b063-90a55b3ccf6a opened 12 years ago

f7ea1b17-2187-492a-b063-90a55b3ccf6a commented 12 years ago
BPO 15500
Nosy @jcea, @pitrou, @vstinner, @tiran, @bitdancer, @asvetlov, @florentx, @socketpair, @eryksun, @ZackerySpytz, @arhadthedev
PRs
  • python/cpython#14578
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'library', '3.10', '3.11'] title = 'Python should support exporting thread names to the OS' updated_at = user = 'https://bugs.python.org/bra' ``` bugs.python.org fields: ```python activity = actor = 'pitrou' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'bra' dependencies = [] files = [] hgrepos = [] issue_num = 15500 keywords = ['patch'] message_count = 10.0 messages = ['166890', '167166', '167459', '167552', '167560', '230736', '377332', '406336', '410896', '410898'] nosy_count = 14.0 nosy_names = ['jcea', 'pitrou', 'vstinner', 'christian.heimes', 'Arfrever', 'r.david.murray', 'asvetlov', 'flox', 'bra', 'kovid', 'socketpair', 'eryksun', 'ZackerySpytz', 'arhadthedev'] pr_nums = ['14578'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue15500' versions = ['Python 3.10', 'Python 3.11'] ```

    f7ea1b17-2187-492a-b063-90a55b3ccf6a commented 12 years ago

    Python class Thread has a "name" argument, which sets the name of the given thread. This name is used only internally, while there is a possibility to set this on an OS-level. Related discussion: http://stackoverflow.com/questions/2369738/can-i-set-the-name-of-a-thread-in-pthreads-linux

    #include <pthread.h>
    int pthread_setname_np(pthread_t thread, const char *name);
    
    // FreeBSD & OpenBSD: function name is slightly different
    void pthread_set_name_np(pthread_t tid, const char *name);
    
    // Mac OS X: it only seems applicable to the current thread (no thread ID)
    int pthread_setname_np(const char*);

    It would be very useful if Python set the name parameter with the above pthread calls, so the user/developer could see which threads do what in ps or top.

    tiran commented 12 years ago

    +1

    win32 supports thread names, too. http://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx

    pitrou commented 12 years ago

    I don't think this should be done by default as it will break people's expectations and, perhaps worse, compatibility.

    f7ea1b17-2187-492a-b063-90a55b3ccf6a commented 12 years ago

    "I don't think this should be done by default as it will break people's expectations and, perhaps worse, compatibility." I won't mind another thread naming API, if somebody does this. :) But personally I expected python to name my threads (and if the OS supports it, on that level), I was actually surprised to see that it doesn't, mostly because I remembered a patch for FreeBSD, which did this years ago, so I thought it has been merged into mainline since then.

    bitdancer commented 12 years ago

    It is indeed the compatibility that is the worse issue. The problem is what people have gotten used to and may have coded their applications to expect/deal with. I agree with you that most people would *not* find it surprising to see the name reflected in the OS, but I don't think the convenience of that is worth introducing a potential backward incompatibility.

    On the other hand, I think this might be an appropriate place to use a global control, so that getting thread names out to the OS would require adding just a single line of code to any given application. I know of an application that does this. It chose to implement it as a global change, and that makes sense to me.

    835f6b8c-37e9-4cfb-bb5c-c81164a22ded commented 10 years ago

    Just FYI, a pure python2 implementation that monkey patches Thread.start() to set the OS level thread name intelligently.

            import ctypes, ctypes.util, threading
            libpthread_path = ctypes.util.find_library("pthread")
            if libpthread_path:
                libpthread = ctypes.CDLL(libpthread_path)
                if hasattr(libpthread, "pthread_setname_np"):
                    pthread_setname_np = libpthread.pthread_setname_np
                    pthread_setname_np.argtypes = [ctypes.c_void_p, ctypes.c_char_p]
                    pthread_setname_np.restype = ctypes.c_int
                    orig_start = threading.Thread.start
                    def new_start(self):
                        orig_start(self)
                        try:
                            name = self.name
                            if not name or name.startswith('Thread-'):
                                name = self.__class__.__name__
                                if name == 'Thread':
                                    name = self.name
                            if name:
                                if isinstance(name, unicode):
                                    name = name.encode('ascii', 'replace')
                                ident = getattr(self, "ident", None)
                                if ident is not None:
                                    pthread_setname_np(ident, name[:15])
                        except Exception:
                            pass  # Don't care about failure to set name
                    threading.Thread.start = new_start
    vstinner commented 4 years ago

    bpo-18006 "Set thread name in linux kernel" was marked as a duplicate of this issue:

    """ In linux (Since 2.6.9) we can use syscall

    prctl(PR_SET_NAME, "Some thread name")

    to set thread name to the kernel. (...) """

    eryksun commented 2 years ago

    Zackery, here's an initial draft implementation for Windows 10+ that's based on the interface you created in PR 14578. It calls WinAPI SetThreadDescription(), which sets the thread's name directly in the kernel thread object (i.e. ETHREAD.ThreadName). This function was added to the API in Windows 10 version 1607 (10.0.14393), so Windows 8.1 systems and older Windows 10 systems aren't supported. I'd rather not implement the exception-based approach for older Windows systems. It doesn't really name the thread and only works when a debugger is attached to the process.

    This initial draft blindly assumes that the filesystem encoding is UTF-8. That's been the case in Windows since Python 3.6, unless configured to use the legacy ANSI encoding, e.g. by setting PYTHONLEGACYWINDOWSFSENCODING. PyThread_set_thread_name() could be changed to use CP_ACP instead of CP_UTF8 when legacy mode is enabled.

    I set the truncation limit to 64 characters. It could be lowered to 15 to match what you implemented for POSIX, but I think that's too short. The limit could also effectively be removed by truncating to the system limit of 32766 characters. That would be simpler since the code wouldn't have to worry about surrogate pairs.

    Python/thread_nt.h:

    typedef HRESULT (WINAPI *PF_SET_THREAD_DESCRIPTION)(HANDLE, PCWSTR);
    static PF_SET_THREAD_DESCRIPTION pSetThreadDescription = NULL;
        static void
        PyThread__init_thread(void)
        {
            // Initialize pSetThreadDescription.
            // Minimum supported Windows version: 10.0.14393 (1607)
            int i = 0;
            LPCWSTR module_names[2] = {
                L"api-ms-win-core-processthreads-l1-1-3", // i.e. kernel32
                L"kernelbase",
            };
            // Most "ms-win-core" API sets (except for COM/WinRT) are implemented
            // by modules that are always loaded, including ntdll, kernelbase, and
            // kernel32, so it's safe to use GetModuleHandleW().
            do {
                pSetThreadDescription = (PF_SET_THREAD_DESCRIPTION)GetProcAddress(
                    GetModuleHandleW(module_names[i]), "SetThreadDescription");
            } while (pSetThreadDescription == NULL &&
                     ++i < Py_ARRAY_LENGTH(module_names));
        }
    
        int
        PyThread_set_thread_name(const char *name)
        {
            HRESULT hr = 0;
            wchar_t *wname = NULL;
    
            if (!initialized) {
                PyThread_init_thread();
            }
            if (name == NULL || *name == '\0') {
                hr = E_INVALIDARG;
                goto exit;
            }
            if (pSetThreadDescription == NULL) {
                hr = E_NOTIMPL;
                goto exit;
            }
    
            // cch includes the terminating null character.
            int cch = MultiByteToWideChar(CP_UTF8, 0, name, -1, NULL, 0);
            if (cch > 0) {
                wname = PyMem_RawMalloc(cch * sizeof(wchar_t));
                if (wname == NULL) {
                    hr = E_OUTOFMEMORY;
                    goto exit;
                }
                cch = MultiByteToWideChar(CP_UTF8, 0, name, -1, wname, cch);
            }
            if (cch == 0) {
                hr = HRESULT_FROM_WIN32(GetLastError());
                goto exit;
            }
    
            // Truncate the name to 64 characters, accounting for surrogate pairs.
            // The OS limit is 32766 wide characters, but long names aren't of
            // practical use.
            int i = 0;
            for (int len = 0; i < (cch - 1) && len < 64; i++, len++) {
                if (i < (cch - 2) && IS_SURROGATE_PAIR(wname[i], wname[i + 1])) {
                    i++; // Skip the trailing surrogate.
                }
            }
            wname[i] = L'\0';
            hr = pSetThreadDescription(GetCurrentThread(), wname);
        exit:
            if (wname != NULL) {
                PyMem_RawFree(wname);
            }
            if (FAILED(hr)) {
                return (int)hr;
            }
            return 0;
        }

    Modules/_threadmodule.c:

        static PyObject *
        _thread__set_thread_name_impl(PyObject *module, PyObject *name)
        {
            int error = PyThread_set_thread_name(PyBytes_AS_STRING(name));
        #ifdef MS_WINDOWS
            // For Python code, ignore a not-implemented error, which means
            // it's a Windows 8.1 system or older Windows 10 system.
            if (error == (int)E_NOTIMPL) {
                error = 0;
            }
        #endif
            if (error) {
                Py_DECREF(name);
                PyErr_SetString(ThreadError, "setting the thread name failed");
                return NULL;
            }
            Py_DECREF(name);
            Py_RETURN_NONE;
        }
    arhadthedev commented 2 years ago

    @r.david.murray

    It is indeed the compatibility that is the worse issue. The problem is what people have gotten used to and may have coded their applications to expect/deal with. I agree with you that most people would *not* find it surprising to see the name reflected in the OS, but I don't think the convenience of that is worth introducing a potential backward incompatibility.

    For now, Python thread names are always empty (as in many other programs). So Python-oriented tools that could expect some other outcome to bother check the names are just impossible (there is no alternative semantics they could perform in non-empty case).

    pitrou commented 2 years ago

    Two things:

    1) I agree this is an extremely valuable addition for any package or application that does a non-trivial use of threads in Python.

    2) It should at least be exposed as a standalone function in the threading module, and ideally also be called automatically by the Threading._bootstrap method.

    lunixbochs commented 1 year ago

    I'm strongly in favor of passing thread names to the OS. I just had to patch this in myself to debug a threading issue, and I was quite surprised Python Thread(name=...) didn't already do it.

    Rust has code in their stdlib for doing this on quite a few platforms: https://github.com/rust-lang/rust/blob/0b35f44/library/std/src/sys/windows/thread.rs#L62-L70 https://github.com/rust-lang/rust/blob/0b35f44/library/std/src/sys/unix/thread.rs#L119-L209

    kurtqq commented 1 year ago

    big +1

    encukou commented 7 months ago

    It should at least be exposed as a standalone function in the threading module, and ideally also be called automatically by the Threading._bootstrap method.

    IMO, setting a thread's name attribute should write the OS name too. With that, I don't think a separate function for the OS name is necessary.