lunarmodules / busted

Elegant Lua unit testing.
https://lunarmodules.github.io/busted/
MIT License
1.43k stars 186 forks source link

random segfault when running with luajit #496

Closed houqp closed 4 years ago

houqp commented 8 years ago

Hi,

We have been having this issue for awhile. 90% of the time, it segfaults at the end of the test, inside luajit's lua_close function (most of the time it dies at lj_gc_freeall). It kind smells like a race condition. Does busted do multi-threading at all?

mpeterv commented 8 years ago

Reduced testsuite reproducing the issue: put this into spec/1_spec.lua and spec/2_spec.lua, run busted.

for i = 1, 1000 do
  loadstring([[
    for j = 1, 20 do
      local x = {0LL}
    end
  ]])()
end

But this looks like a bug in LuaJIT, not busted...

houqp commented 8 years ago

Interesting, It could be a bug in LuaJIT. But we have been running the same code on many real devices for a long time any never ran into this issue. It only happens quiet frequently when running with busted.

But yeah, if it works for lua5.1 and assuming busted is not using any special magic to run those tests, then I would agree it's more likely a bug in LuaJIT.

Tieske commented 8 years ago

we have seen this as well on Travis and CircleCI, where tests crashed at the end (busted with LuaJit). Probably warrants a bug report...

houqp commented 8 years ago

Sometimes the tests even crash during the execution. But this happens way less frequent.

vsergeev commented 8 years ago

:+1: I am experiencing this issue as well. In my case it occurs maybe 20% of the time.

Tieske commented 8 years ago

did someone report this on the LuaJit mailing list?

houqp commented 8 years ago

I can report it to the list. @Tieske , just to confirm, busted is not doing multi-threading right?

Tieske commented 8 years ago

No, it isn't. Please do report, otherwise I will, just wanted to make sure we don't report it multiple times.

houqp commented 8 years ago

I have sent the report to the mailing list.

Tieske commented 8 years ago

:+1:

For the record, the reported issue; http://www.freelists.org/post/luajit/random-segfaults-when-running-tests-with-busted

mpeterv commented 8 years ago

Actually, given that I can reproduce the issue with busted 2.0.rc7 but not 2.0.rc6 this could be an issue in lua-term, a C dependency added in rc7. I'll bisect for exact commit introducing the issue.

mpeterv commented 8 years ago

Nevermind, commenting out lua-term import doesn't fix the problem, bisect shows that segfault is introduced in @b3993e6.

houqp commented 8 years ago

@mpeterv what version of luajit are you using? EDIT: nvm, I was able to reproduce with the latest LuaJIT.

mpeterv commented 8 years ago

@houqp I reproduced the issue using busted with testcase I posted above with luajit 2.0.0, 2.0.4, head of master branch, head of v2.1 branch. I've also reduced the testcase (the whole blob with busted, luarocks, penlight and other dependencies) to one file. I'll finish some more testing with different compilation flags and post it to luajit list.

mpeterv commented 8 years ago

So, as expected, this is a duplicate of #369. Some workarounds are posted there.

houqp commented 8 years ago

Hm... I think the problem now is the discussion in #369 is not documented. @Tieske is it possible to get this documented on the official documentation page?

For us, the workaround does not work. Most of our modules requires ffi, and it's not easier to put all of them into a helper.lua script. It's probably easier for neovim since their application code does not depend on luajit ffi. I guess a cleaner way to fix it is to run each test in its own process...

mpeterv commented 8 years ago

@houqp it should be enough to require just the FFI module in helper.lua, ignoring any other modules depending on it.

houqp commented 8 years ago

@mpeterv I tried that, but got a bunch of redefined errors in the cdef call (link), which is imported by some of the tests.

Error message:

Error → ffi/util.lua @ 54
suite ./spec/front/unit/nickel_conf_spec.lua
ffi/util.lua:54: attempt to redefine '_FILETIME' at line 11

stack traceback:
        ffi/util.lua:54: in main chunk
mpeterv commented 8 years ago

@houqp so busted drops and reloads that module, too. So, yes, you'll need to add all modules that can't be safely reloaded in helper.lua.

houqp commented 8 years ago

That's why I said this workaround is not practical for us ;) Basically all our modules depends on FFI, that means we will have to put all of them in helper.lua...

vsergeev commented 8 years ago

As a workaround, I've had good results running busted with the option --no-auto-insulate, e.g.

busted --lua=luajit --no-auto-insulate tests

This does sacrifice the test insulation feature of busted, but it seems to prevent the crash caused by the ffi module being unanchored between tests.

houqp commented 8 years ago

@vsergeev thanks! I can also confirm adding --no-auto-insulate fixes all our random segfaults :)

houqp commented 8 years ago

I suggest we add this to the documentation.

DorianGray commented 8 years ago

Alongside the documentation, I suggest we add a list of default preloaded libraries. We add common things that don't like to be reloaded, like ffi, by default, and allow users to add modules either in helper.lua or at the command line.

houqp commented 8 years ago

yep, that will work too, as long as the behavior is documented :)

Tieske commented 7 years ago

for reference, a possible fix: https://github.com/Olivine-Labs/luassert/pull/150

Tieske commented 4 years ago

closing this, reloading ffi has been resolved