scoder / lupa

Lua in Python
http://pypi.python.org/pypi/lupa
Other
1.01k stars 135 forks source link

SEGFAULT calling lua #84

Open mickeprag opened 7 years ago

mickeprag commented 7 years ago

I am getting segfaults from time to time. It seems to happen when python objects are wrapped and sent to the lua runtime. Lupa version 1.4 Lua: tested version 5.1.5, 5.2.3, and luajit 2.0.4

Backtraces of the crash: Lua:

#0  0x00007f639189c261 in lua_type () from /usr/lib64/liblua5.2.so.0
#1  0x00007f6391ae1b21 in __pyx_f_4lupa_5_lupa_lua_object_repr ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#2  0x00007f6391adf4ac in __pyx_pf_4lupa_5_lupa_10_LuaObject_14__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#3  0x00007f6391ade33d in __pyx_pw_4lupa_5_lupa_10_LuaObject_15__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#4  0x00007f63994c974a in _PyObject_Str () from /usr/lib64/libpython2.7.so.1.0
#5  0x00007f63994dad98 in PyString_Format () from /usr/lib64/libpython2.7.so.1.0
#6  0x00007f639952735f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#7  0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#8  0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#9  0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#10 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#11 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#12 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#13 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#14 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#15 0x00007f63994b30cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#16 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#17 0x00007f639952365e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#18 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#19 0x00007f63994b30cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#20 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#21 0x00007f639952365e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#22 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#23 0x00007f63995264ae in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#24 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#25 0x00007f63994b30cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#26 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#27 0x00007f639952365e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#28 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#29 0x00007f63995265cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#30 0x00007f6399529ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#31 0x00007f63994b2fec in ?? () from /usr/lib64/libpython2.7.so.1.0
#32 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#33 0x00007f639949c275 in ?? () from /usr/lib64/libpython2.7.so.1.0
#34 0x00007f639948d333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#35 0x00007f639951fd77 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.7.so.1.0
#36 0x00007f6399558ab2 in ?? () from /usr/lib64/libpython2.7.so.1.0
#37 0x00007f639922c444 in start_thread () from /lib64/libpthread.so.0
#38 0x00007f6398f735ed in clone () from /lib64/libc.so.6

Luajit:

#0  0x00007f555d8a3c78 in lua_rawgeti () from /usr/lib64/libluajit-5.1.so.2
#1  0x00007f555db13b3e in __pyx_f_4lupa_5_lupa_10_LuaObject_push_lua_object ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#2  0x00007f555db15736 in __pyx_pf_4lupa_5_lupa_10_LuaObject_14__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#3  0x00007f555db152f7 in __pyx_pw_4lupa_5_lupa_10_LuaObject_15__str__ ()
   from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#4  0x00007f556550074a in _PyObject_Str () from /usr/lib64/libpython2.7.so.1.0
#5  0x00007f5565511d98 in PyString_Format () from /usr/lib64/libpython2.7.so.1.0
#6  0x00007f556555e35f in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#7  0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#8  0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#9  0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#10 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#11 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#12 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#13 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#14 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#15 0x00007f55654ea0cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#16 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#17 0x00007f556555a65e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#18 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#19 0x00007f55654ea0cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#20 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#21 0x00007f556555a65e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#22 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#23 0x00007f556555d4ae in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#24 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#25 0x00007f55654ea0cd in ?? () from /usr/lib64/libpython2.7.so.1.0
#26 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#27 0x00007f556555a65e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#28 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#29 0x00007f556555d5cb in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#30 0x00007f5565560ad0 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#31 0x00007f55654e9fec in ?? () from /usr/lib64/libpython2.7.so.1.0
#32 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#33 0x00007f55654d3275 in ?? () from /usr/lib64/libpython2.7.so.1.0
#34 0x00007f55654c4333 in PyObject_Call () from /usr/lib64/libpython2.7.so.1.0
#35 0x00007f5565556d77 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.7.so.1.0
#36 0x00007f556558fab2 in ?? () from /usr/lib64/libpython2.7.so.1.0
#37 0x00007f5565263444 in start_thread () from /lib64/libpthread.so.0
#38 0x00007f5564faa5ed in clone () from /lib64/libc.so.6
kmike commented 7 years ago

I know it can be hard, but could you try creating a small-ish reproducible example?

mickeprag commented 7 years ago

I have really tried boiling this down to a simple example to reproduce this issue. I am not sure I have succeded but in the mean time I have been able to reproduce a similar segfault. Not sure if this is the same issue. This code segfaults roughly every ~500 iterations on my computer.

from lupa import LuaRuntime, unpacks_lua_table
import threading, time

script = """
function callDone(response)
    local json = response:json()
end

function run(arg)
    local http = r()
    local request = http:get('https://httpbin.org/ip', callDone)
end

"""

class DummyResponse():
    def json(self):
        return {}

def dummyCall(**kwargs):
    time.sleep(0.1)

class Request(object):
    @unpacks_lua_table
    def get(self, url, success=None, **kwargs):
        r = PendingRequest(dummyCall, success, {'url': url})
        r.start()
        return r

class PendingRequest(threading.Thread):
    def __init__(self, fn, callback, kwargs):
        super(PendingRequest,self).__init__(name='HTTP request')
        #self.daemon = True
        self.fn = fn
        self.callback = callback
        self.kwargs = kwargs

    def run(self):
        try:
            r = self.fn(**self.kwargs)
        except Exception as e:
            print("Could not execute http request %s", e)
            return
        if self.callback is not None:
            thread = self.callback.coroutine(DummyResponse())
            try:
                thread.send(None)
            except StopIteration:
                pass
            self.callback = None

request = Request()
def r():
    return request

lua = LuaRuntime(
    unpack_returned_tuples=True,
    register_eval=False,
)
lua.globals().r = r
lua.execute(script)

for i in range(10000):
    fn = getattr(lua.globals(), 'run')
    print("Start call", i)
    thread = fn.coroutine()
    try:
        thread.send(None)
    except StopIteration:
        pass

print("Wait for shutdown")
#time.sleep(2)
mickeprag commented 7 years ago

I think I have some more information. I think the cause is two different threads try to access the lua runtime. The above example does this intentionally but in my software this is a side effect. Let me try to explain. I have a lua runtime running in its own isolated thread. The lua thread tries to call some python code and I send this over to the python main thread to avoid concurrency issues. The python code gets a reference to the lua runtime (but never access it directly) in a wrapped object. When python garbage-collects this object this is done in the main thread. The reference to the lua runtime is cleared (by the python interpreter, not my code) and it's here I get a segfault.

I have not managed to create an isolated example of this but it is fairly reproducible in my project. Here is an example of the wrapper object:

class LuaFunctionWrapper(object):
    def __init__(self, cb):
        self.cb = cb  # A pointer to a lua function

    def __del__(self):
        self.cb = None  # This is where the segfault occurs

The segfault happens even without the destructor. I added the destructor with an implicit "freeing" of cb to verify the backtrace that the segfault happens there.

mickeprag commented 7 years ago

Backtrace of the above observations:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff72ffd700 (LWP 32133)]
0x00007fffef133d82 in lua_rawgeti (L=L@entry=0x7fff60004ef0, idx=idx@entry=-1001000, n=n@entry=0) at lapi.c:654
654       setobj2s(L, L->top, luaH_getint(hvalue(t), n));
(gdb) py-bt
Traceback (most recent call first):
  File "/home/micke/Documents/dev/telldus/tellstick-server/lua/src/lua/LuaScript.py", line 99, in __del__
    self.cb = None
  File "/home/micke/Documents/dev/telldus/tellstick-server-plugins/http/http/http.py", line 61, in run
    self.failure = None
  File "/usr/lib64/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()

(gdb) bt
#0  0x00007fffef133d82 in lua_rawgeti (L=L@entry=0x7fff60004ef0, idx=idx@entry=-1001000, n=n@entry=0) at lapi.c:654
#1  0x00007fffef146291 in luaL_unref (L=0x7fff60004ef0, t=-1001000, ref=7) at lauxlib.c:546
#2  0x00007fffef3739f0 in __pyx_pf_4lupa_5_lupa_10_LuaObject_2__dealloc__ () from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#3  0x00007fffef37361d in __pyx_pw_4lupa_5_lupa_10_LuaObject_3__dealloc__ () from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#4  0x00007fffef390ea7 in __pyx_tp_dealloc_4lupa_5_lupa__LuaObject () from /home/micke/Documents/dev/telldus/tellstick-server/build/env/lib/python2.7/site-packages/lupa/_lupa.so
#5  0x00007ffff7a7374f in insertdict_by_entry (mp=0x7fffd40b77f8, key='cb', hash=<optimized out>, ep=<optimized out>, value=<optimized out>)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/dictobject.c:519
#6  0x00007ffff7a751b0 in dict_set_item_by_hash_or_entry (
op=op@entry={'destructionHandlers': [(<instancemethod at remote 0x7fffd407f050>, (), {})], 'cb': None, 'script': <LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-methods'], [....(truncated), key=<optimized out>, hash=<optimized out>, ep=ep@entry=0x0, 
value=value@entry=None) at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/dictobject.c:795
#7  0x00007ffff7a76164 in PyDict_SetItem (
op=op@entry={'destructionHandlers': [(<instancemethod at remote 0x7fffd407f050>, (), {})], 'cb': None, 'script': <LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-methods'], [....(truncated), key=key@entry='cb', value=value@entry=None)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/dictobject.c:848
#8  0x00007ffff7a7b638 in _PyObject_GenericSetAttrWithDict (obj=<optimized out>, name='cb', value=None,
dict={'destructionHandlers': [(<instancemethod at remote 0x7fffd407f050>, (), {})], 'cb': None, 'script': <LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-methods'], [....(truncated))
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/object.c:1529
#9  0x00007ffff7a7b03f in PyObject_SetAttr (
v=v@entry=<LuaFunctionWrapper(destructionHandlers=[(<instancemethod at remote 0x7fffd407f050>, (), {})], cb=None, script=<LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitiveDict(_store=<OrderedDict(_OrderedDict__root=[[[[[[[[[[[[[...], [...], 'access-control-allow-origin'], [...], 'access-control-allow-met...(truncated), name=<optimized out>, name@entry='cb', value=value@entry=None)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Objects/object.c:1252
#10 0x00007ffff7ad3bec in PyEval_EvalFrameEx (
f=f@entry=Frame 0x7fffd407a578, for file /home/micke/Documents/dev/telldus/tellstick-server/lua/src/lua/LuaScript.py, line 99, in __del__ (self=<LuaFunctionWrapper(destructionHandlers=[(<instancemethod at remote 0x7fffd407f050>, (), {})], cb=None, script=<LuaScript(_LuaScript__queue=[(<lupa._lupa._LuaFunction at remote 0x7fffd4074fa0>, (<Response(cookies=<RequestsCookieJar(_now=1495024665, _policy=<DefaultCookiePolicy(strict_rfc2965_unverifiable=True, strict_ns_domain=0, _allowed_domains=None, rfc2109_as_netscape=None, rfc2965=False, strict_domain=False, _now=1495024665, strict_ns_set_path=False, strict_ns_unverifiable=False, strict_ns_set_initial_dollar=False, hide_cookie2=False, _blocked_domains=(...), netscape=True) at remote 0x7fffd41093b0>, _cookies={}, _cookies_lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7fffd409b7d0>, _RLock__count=0) at remote 0x7fffd4180750>) at remote 0x7fffd4180950>, _content='{"success":true,"message":"success"}', headers=<CaseInsensitive...(truncated), throwflag=throwflag@entry=0)
at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Python/ceval.c:2253
#11 0x00007ffff7ada7d0 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x7fffd4168e28, argcount=1, 
kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at /usr/src/debug/dev-lang/python-2.7.12/Python-2.7.12/Python/ceval.c:3582
#12 0x00007ffff7a63ccc in function_call (func=<function at remote 0x7fffef5b8938>,
mickeprag commented 7 years ago

Finally, a small-ish reproducible example! No threads or anything special... ;) https://gist.github.com/mickeprag/75a0fbf04cfd06c3fe48b759da22f5ef

xeor commented 6 years ago

Did you do any more work on this? I'm new to lua, but have done my share of python.. can i help?

mickeprag commented 6 years ago

I can only answer from my side. I have tried to see where the crash happen but I could not understand fully why. If @kmike done anyting more, I do not know. If you want to help. Start by see if you can reproduce the crash on your computer using my test-script.

scoder commented 6 years ago

I can reproduce the crash, but the stack trace changes on each run. That suggests that there might be some kind of Lua stack corruption that only shows at a later point. Meaning, the crash is almost certainly not where the problem is.

xeor commented 6 years ago

Able to reproduce using Python 3.7.0 and lupa 1.7..

Going to test a couple of versions now

scoder commented 6 years ago

Well, it's certainly worth git bisect-ing it, although my intuition wants me to assume that it's been there forever, or at least as long as the features that the reproducer script uses are around...

xeor commented 6 years ago

Tried a couple of random versions..

3.7 - 1.6, 1.7 3.4.9 - 1.0 (without unpacks_lua_table) 2.7.15 - 1.7, 1.5,

all crashes randomly between 15 and 400..

xeor commented 6 years ago

Running the test and just watching it fail, dumps a lot of different errors on the console. Mostly segfaults, but also python malloc (python: malloc.c:3760: _int_malloc: Assertion(unsigned long) (size) >= (unsigned long) (nb)' failed.`), errors and other python errors..

This might be a stupid question, but if I move thread = fn.coroutine() (https://gist.github.com/mickeprag/75a0fbf04cfd06c3fe48b759da22f5ef#file-crashtest-py-L39) outside the loop. It never fails.. Will it still work?

scoder commented 6 years ago

Maybe you could try to strip down the test case? It's very complex and uses lots of features: Lua couroutines, the @unpacks_lua_table decorator, runtime options … Any feature that can be avoided will make it easier to find the place where things go wrong.

xeor commented 6 years ago

I'm very new to lua, so I'm not sure where to begin... I'll continue playing with it a little tho..

xeor commented 6 years ago

Another strange find is that if I set anything under self to the callback object passed into the PendingRequest, it crashes..

Example..

class PendingRequest(object):
    def __init__(self, callback):
        super(PendingRequest,self).__init__()
        self.callback = callback
        thread = self.callback.coroutine()
        try:
            thread.send(None)
        except StopIteration:
            pass

is the original..

class PendingRequest(object):
    def __init__(self, callback):
        super(PendingRequest,self).__init__()
        # self.callback = callback
        thread = callback.coroutine()
        try:
            thread.send(None)
        except StopIteration:
            pass

does not crash...

But

class PendingRequest(object):
    def __init__(self, callback):
        super(PendingRequest,self).__init__()
        self.xx = callback
        thread = callback.coroutine()
        try:
            thread.send(None)
        except StopIteration:
            pass

do crash..

mickeprag commented 6 years ago

Maybe you could try to strip down the test case? It's very complex and uses lots of features: Lua couroutines, the @unpacks_lua_table decorator, runtime options

I have simplified the test case. Actually, removing the runtime options makes the script crash sooner on my machine. I cannot reproduce the crash without using coroutines. Som my guess is that there is somewhere there the issue is.

Two observations: 1) If I do not return the object PendingRequest in Request.get() it does not crash. 2) If the callback variable is not stored in self (in PendingRequest.__init__) is does not crash. Same observation as @xeor.

Maybe this has something to do when the PendingRequest object is cleaned up by the Python garbage collector and it tries to release the reference to the lua-function? Just my speculations...

xeor commented 6 years ago

I tried to turn off gc. import gc; gc.disable(), made no difference..

noahcgreen commented 4 years ago

Has there been any work done on this? I'm running into the same issue with coroutines unpredictably segfaulting.

mickeprag commented 4 years ago

From my side, no, unfortunately not.

scoder commented 4 years ago

There is a reproducing script in https://gist.github.com/mickeprag/75a0fbf04cfd06c3fe48b759da22f5ef It's probably still not minimal and requires more investigation to find the point where things go wrong in the code. Help with that is welcome.

noahcgreen commented 4 years ago

Here's a slightly more minimal reproducing script:

from lupa import LuaRuntime

class PendingRequest:

    def __init__(self, callback):
        self.callback = callback

def make_request(callback):
    return PendingRequest(callback)

lua = LuaRuntime()
lua.globals().make_request = make_request
run = lua.eval("""
function()
    make_request(function() end)
end
""")

for i in range(10000):
    print("Start call", i)
    thread = run.coroutine()
    try:
        thread.send(None)
    except StopIteration:
        pass

print("Finished successfully")

Almost every time I run this I get an error similar to this:

Python(83285,0x10f966dc0) malloc: Incorrect checksum for freed object 0x7f8de7f2bd78: probably modified after being freed.
Corrupt value: 0x0
Python(83285,0x10f966dc0) malloc: *** set a breakpoint in malloc_error_break to debug
zsh: abort      python3 crashtest.py

So I do think it's likely there is some error with garbage collection/deallocation. I'm still not so comfortable debugging Cython but I'll try to look at this more over the weekend.