python / cpython

The Python programming language
https://www.python.org
Other
61.8k stars 29.71k forks source link

ctypes callback with structure crashes in Python 3.8 on Windows x86 #85193

Open 18fbd16e-da2d-4aa3-8e72-b50f790f5cc3 opened 4 years ago

18fbd16e-da2d-4aa3-8e72-b50f790f5cc3 commented 4 years ago
BPO 41021
Nosy @pfmoore, @vsajip, @amauryfa, @abalkin, @tjguk, @meadori, @zware, @eryksun, @zooba, @itsgk92
Files
  • callback.zip: source zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-crash', '3.10', 'ctypes', 'OS-windows', '3.9'] title = 'ctypes callback with structure crashes in Python 3.8 on Windows x86' updated_at = user = 'https://github.com/itsgk92' ``` bugs.python.org fields: ```python activity = actor = 'steve.dower' assignee = 'none' closed = False closed_date = None closer = None components = ['Windows', 'ctypes'] creation = creator = 'itsgk92' dependencies = [] files = ['49253'] hgrepos = [] issue_num = 41021 keywords = [] message_count = 8.0 messages = ['371796', '371844', '371851', '371920', '402695', '403035', '412729', '412771'] nosy_count = 11.0 nosy_names = ['paul.moore', 'vinay.sajip', 'amaury.forgeotdarc', 'belopolsky', 'tim.golden', 'meador.inge', 'zach.ware', 'eryksun', 'steve.dower', 'itsgk92', 'xs_iceman'] pr_nums = [] priority = 'normal' resolution = None stage = 'test needed' status = 'open' superseder = None type = 'crash' url = 'https://bugs.python.org/issue41021' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10'] ```

    18fbd16e-da2d-4aa3-8e72-b50f790f5cc3 commented 4 years ago

    Python Process crashes with unauthorised memory access when the C++ DLL callback has arguments as structures. Happens in python 3.8. python 3.7 works fine with same source code. Initial investigations revels that the structure is called back as a pointer instead of plain python object. Callbacks with Primitive types are successful. This might be something to do with vector calling in python 3.8?

    >>Replicate:

    >>A c++ DLL with "subscribe_cb" function and a callback "listen_cb"; where FILETIME is windows FILETIME structure (it can be any structure for that matter)

    extern "C" CBFUNC(void, listen_cb)(int value, FILETIME ts ); extern "C" EXFUNC(void) subscribe_cb(listen_cb);

    EXFUNC(void) subscribe_cb(listen_cb cb)
    {
    
        int i = 0;
        while (true) {
            FILETIME systime;
            GetSystemTimeAsFileTime(&systime);
            i++;
            Sleep(1000);
            cb(i, systime);
       }
    }

    >>Python client for the dll

    class FT(Structure):
        _fields_ = [("dwLowDateTime", c_ulong, 32),
                    ("dwHighDateTime", c_ulong, 32)]
    
    @WINFUNCTYPE(c_void_p, c_int, FT)
    def cb(val, ft):
        print(f"callback {val} {ft.dwLowDateTime} {ft.dwHighDateTime}")
    
    lib = WinDLL(r"C:\Temp\CBsimulate\CbClient\CbClient\Debug\DummyCb.dll")
    
    lib.subscribe_cb(cb)
    
    while 1:
        sleep(5)
    eryksun commented 4 years ago

    Please provide complete code that can be compiled as is, and the required build environment. I wrote a similar example in C, compiled for x64 with MSVC, and it worked fine for me. Also, please provide more details about the error -- a traceback in a native debugger, exception analysis, or a dump file.

    One detail that stands out to me is that you set the ctypes callback return type to c_void_p -- instead of void (None). But that shouldn't cause a crash. Another concern is your use of WINFUNCTYPE (stdcall) and WinDLL instead of CFUNCTYPE (cdecl) and CDLL. In x64 they're the same, but it's still important to use the function's declared calling convention, for the sake of x86 compatibility. Make sure you're really using stdcall.

    Here's the code that worked for me:

    c/test.c:

        #include <windows.h>
    
        typedef void listen_fn(int, FILETIME);
    
        static void
        subscribe_thread(listen_fn *cb)
        {
            int i = 0;
            FILETIME systime;
            while (1) {
                GetSystemTimeAsFileTime(&systime);
                cb(i++, systime);
                Sleep(1000);
           }
        }
    
        void __declspec(dllexport)
        subscribe(listen_fn *cb)
        {
            CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)subscribe_thread, 
                cb, 0, NULL);
        }

    test.py:

        import ctypes
        from ctypes import wintypes
    
        lib = ctypes.CDLL('c/test.dll')
        listen_fn = ctypes.CFUNCTYPE(None, ctypes.c_int, wintypes.FILETIME)
    
        @listen_fn
        def cb(val, ft):
            print(f"callback {val:03d} {ft.dwHighDateTime} {ft.dwLowDateTime}")
    
        print('press enter to quit')
        lib.subscribe(cb)
        input()
    18fbd16e-da2d-4aa3-8e72-b50f790f5cc3 commented 4 years ago

    Hi, Thanks for response @eryksun

    Sorry that I did not mention it is compiled for x86 and the crash happens in 32bit python. Yes, the dll is declared with stdcall and built for x86 with MSVC. It is tested with python 3.7.4 32bit which is successful and fails on python 3.8.2 32bit. I did not look into on x64 though. Below is the source code. I have also attached them as a zip.

    >> c/DummyCb.h

    #pragma once
    #include <Windows.h>
    #include <iostream>
    
    #define CALLBACK    __stdcall
    #define WINAPI      __stdcall
    
    #define EXFUNC(type) _declspec(dllexport) void WINAPI // export function
    #define CBFUNC(type,func) typedef void (CALLBACK* func) // callback function

    extern "C" CBFUNC(void, listen_cb)(int value, FILETIME ts ); extern "C" EXFUNC(void) subscribe_cb(listen_cb);

    >> c/DummyCb.cpp

    #include "pch.h"
    #include "DummyCb.h"
    #include <stdio.h>
    
    EXFUNC(void) subscribe_cb(listen_cb cb)
    {
    
        int i = 0;
        while (true) {
            FILETIME systime;
            GetSystemTimeAsFileTime(&systime);
            i++;
            Sleep(1000);
            cb(i, systime);
       }
    }

    >> c/CbClient.cpp

    #include <iostream>
    #include "DummyCb.h"
    #include <stdio.h>
    #include <Windows.h>

    void __stdcall lis(int a, FILETIME ts) {

        printf("calling back %d, %d, %d \n", a,ts.dwHighDateTime, ts.dwLowDateTime);
    }
    
    int main()
    {
        subscribe_cb(&lis);
    }

    >> CbClient.py

    from ctypes import *
    from time import sleep
    
    class FT(Structure):
        _fields_ = [("dwLowDateTime", c_ulong, 32),
                    ("dwHighDateTime", c_ulong, 32)]
    
    @WINFUNCTYPE(c_void_p, c_int, FT)
    def cb(val, ft):
        print(f"callback {val} {ft.dwLowDateTime} {ft.dwHighDateTime}")
    
    lib = WinDLL(r"c/DummyCb.dll")
    
    lib.subscribe_cb(cb)
    
    while 1:
        sleep(5)

    >> Traceback on python

        lib.subscribe_cb(cb)
    OSError: exception: access violation writing 0xCA35789B

    >> Native Debug traceback

    Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.

    >> Further analysis

    change the callback declaration to a pointer of structure and it has same crash on python 3.7 32 bit.

    @WINFUNCTYPE(c_void_p, c_int, POINTER(FT))
    def cb(val, ft):
        print(f"callback {val} {ft.contents}")

    Which makes me correlate is it something to do with vector calling on python 3.8 which makes callabck as *cb (a pointer) instead of a plain object call (just a thought, did not investigate change logs). Though its interesting to note that it did not fail on x64 with cdecl calling convention. That explains why our larger code base din't crash on Linux x86 linked as a .SO .

    cheers.

    eryksun commented 4 years ago

    I can reproduce a crash with an x86 build that uses a stdcall callback with a 32-bit argument followed by any ffi_type_sint64 argument (FILETIME, long long, etc). Apparently this is a bug in libffi's ffi_prep_cif. Given the 2nd argument has a size of 8 bytes and an alignment of 8 bytes, it sets the CIF (call interface) to 16 bytes instead of the correct value of 12 bytes.

    source:

        import ctypes
    
        @ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.c_int64)
        def test(i, j):
            print(f"callback {i} {j}")

    debugger:

    0:000:x86> k 2
    ChildEBP RetAddr
    00bef6a0 6f1873bd _ctypes_d!_ctypes_alloc_callback+0x24c
    00bef6dc 6e7ad81b _ctypes_d!PyCFuncPtr_new+0x32d

    The bytes field should be 12, not 16:

    0:000:x86> ?? p->cif
    struct ffi_cif
       +0x000 abi              : 2 ( FFI_STDCALL )
       +0x004 nargs            : 2
       +0x008 arg_types        : 0x02dc43bc  -> 0x00742e5c _ffi_type
       +0x00c rtype            : 0x6f1accec _ffi_type
       +0x010 bytes            : 0x10
       +0x014 flags            : 8
    
    0:000:x86> ?? p->cif.arg_types[0]
    struct _ffi_type * 0x00742e5c
       +0x000 size             : 4
       +0x004 alignment        : 4
       +0x006 type             : 0xa
       +0x008 elements         : (null)
    
    0:000:x86> ?? p->cif.arg_types[1]
    struct _ffi_type * 0x02dd635c
       +0x000 size             : 8
       +0x004 alignment        : 8
       +0x006 type             : 0xc
       +0x008 elements         : (null)

    When the callback returns, it cleans 16 bytes from stack instead of the expected 12. With stack-allocated systime in the sample code, eventually GetSystemTimeAsFileTime(&systime) ends up overwriting the return address, and it tries to return to an arbitrary address. If you're lucky, this address isn't executable memory, and it immediately fails on an access violation.

    The callback in Aravindhan's sample code works fine if I modify the CIF to use 12 bytes instead of 16:

    _ctypes_d!_ctypes_alloc_callback+0x24c:
    6e9f36ec 83c414          add     esp,14h
    0:000:x86> ?? p->cif.bytes = 12
    unsigned int 0xc
    
    0:000:x86> bd 0; g
    callback 1 477180670 30820036
    callback 2 487502671 30820036
    callback 3 497655140 30820036
    callback 4 507801994 30820036
    callback 5 517818512 30820036
    callback 6 527961670 30820036

    I suppose this issue should be closed as third party, with a bug report pushed upstream -- unless for now it would be better if ctypes implements a workaround.

    80220989-12ba-400f-b0df-0714f408684f commented 2 years ago

    What is the next steps on this bug?

    Have you created a bug with libffi if this is not working correctly eryksun? Sounds like you understand what the bug report need to contain.

    On implementing a workaround in ctypes. Is this possible?

    zooba commented 2 years ago

    Judging from https://github.com/libffi/libffi, they could do with more contributors/support over there.

    I see a bunch of reports along these lines from a few years back, apparently someone has a test suite that covers it, but hasn't merged it into the libffi test suite which is where it really belongs.

    If it turns out to be a straightforward fix, we're able to patch our own build rather than waiting for a libffi release (and since we're unlikely to get new 3.3 releases for 3.10 and earlier, that's what we'll need to release it in updates). But we don't run their test suite on our build, so we'd want it to be upstream as well.

    80220989-12ba-400f-b0df-0714f408684f commented 2 years ago

    Steve Dower do you have any links to the issues you mention? Are these now solved?

    I am not able to see if the fix is straightforward, but from the information from Eryk Sun it sounds like it should be?

    zooba commented 2 years ago

    Only by following the link I posted and searching for issues that sound like this one.

    Which I just did for you: https://github.com/libffi/libffi/issues/367

    There may be more, though. I just grabbed the first one that looked like a match.