love2d / love

LÖVE is an awesome 2D game framework for Lua.
https://love2d.org
Other
4.84k stars 388 forks source link

Segmentation fault when calling the first-declared FFI callback #1776

Closed apicici closed 2 years ago

apicici commented 2 years ago
local ffi = require "ffi"

local test = function() end
local cb1 = ffi.cast("void (*)()", test)
local cb2 = ffi.cast("void (*)()", test)

Calling the first-declared FFI callback causes segmentation faults on Linux with LÖVE 11.4. I tested with 11.3 and the problem does not occur there; it also does not seem to happen on Windows or macOS (tested with virtual machines).

Some more details:

I put together an extended main.lua for testing purposes:

local ffi = require "ffi"

local test = function(i)
    print(string.format("Callback %d was run", i))
end

local cb = {
    ffi.cast("void (*)(int)", test),
    ffi.cast("void (*)(int)", test),
    ffi.cast("void (*)(int)", test),
}

local function test1()
    print "Try running the 1st callback twice:"
    print "------------------"
    cb[1](1); cb[1](1)
end
local function test2()
    print "Try running the 1st callback followed by the 2nd, then the 1st again:"
    print "------------------"
    cb[1](1); cb[2](2); cb[1](1)
end
local function test3()
    print "Try running the 2nd and 3rd callbacks multiple times:"
    print "------------------"
    cb[2](2); cb[3](3); cb[2](2); cb[3](3)
end
local function test4()
    print "Try running the 1st callback after the 2nd:"
    print "------------------"
    cb[2](2); cb[1](1)
end

-- Comment out one of the following lines to test the various combinations

test1()
-- test2()
-- test3()
-- test4()
MikuAuahDark commented 2 years ago

Can you try if the same happends on LuaJIT interpreter, i.e. without using LOVE? https://github.com/LuaJIT/LuaJIT

thegrb93 commented 2 years ago

I tried with luajit-2.1.0 and didn't have any issue. Looks like he also tried it with 2.1

slime73 commented 2 years ago

Which 2.1 commit? There aren't labelled releases of 2.1 anymore.

thegrb93 commented 2 years ago

The readme says LuaJIT 2.1.0-beta3

thegrb93 commented 2 years ago

Ah I found it in refs/heads. https://github.com/LuaJIT/LuaJIT/commit/20f556e53190ab9a735b932f5d868d45ec536a70

slime73 commented 2 years ago

The readme says LuaJIT 2.1.0-beta3

Yeah, that beta3 name is sort of meaningless now unfortunately. The first "beta3" release was a long time ago and there have been a lot of changes to LuaJIT 2.1's codebase since then. love 11.4 uses a very recent git commit of 2.1 in most situations.

thegrb93 commented 2 years ago

I see, yeah my local version is 130 commits behind.

apicici commented 2 years ago

Can you try if the same happends on LuaJIT interpreter, i.e. without using LOVE? https://github.com/LuaJIT/LuaJIT

I have tried with the version of LuaJIT in the archlinux repositories (2.1.0.beta3.r384.g1d7b5029) as well as LÖVE 11.4 from archlinux, which uses the same LuaJIT version. The segmentation faults only occur when running it through LÖVE.

I have also tried to run the file directly with LuaJIT and loading LÖVE by requiring the shared library and calling love.boot(). This also leads to the segmentation fault.

apicici commented 2 years ago

I just confirmed that the archlinux LuaJIT is built from the current HEAD of https://github.com/LuaJIT/LuaJIT, commit 1d7b5029c5ba36870d25c67524034d452b761d27

slime73 commented 2 years ago

If these two lines (plus the sizemcode line at the bottom) are run in a standalone interpreter does it affect whether it's reproducible there? https://github.com/love2d/love/blob/main/src/modules/love/jitsetup.lua#L29-L34

apicici commented 2 years ago

If these two lines (plus the sizemcode line at the bottom) are run in a standalone interpreter does it affect whether it's reproducible there? https://github.com/love2d/love/blob/main/src/modules/love/jitsetup.lua#L29-L34

It still works without problems if I add those three lines. In fact I don't think it is related to something in LÖVE that is only run if JIT is on, since I tried running

-- script.lua
require "love"
require "love.boot"()

with luajit -j off script.lua <path to folder containing main.lua> and I still get the segmentation faults.

apicici commented 2 years ago

I did some more testing by modifying the love.init function in boot.lua, and I found out that the segmentation faults are related to love.audio. If I disable the audio module from love.conf everything works without problems.

slime73 commented 2 years ago

Maybe it's memory corruption caused by either OpenAL-Soft or by a platform audio backend that alsoft is using on your system?

MikuAuahDark commented 2 years ago

If it's has to do with love.audio, it could be related to https://github.com/love2d/love/issues/1763

Pada tanggal Sel, 15 Feb 2022 05.55, Alex Szpakowski < @.***> menulis:

Maybe it's memory corruption in either OpenAL-Soft or in a platform audio backend that alsoft is using on your system?

— Reply to this email directly, view it on GitHub https://github.com/love2d/love/issues/1776#issuecomment-1039607802, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZHFFT45R3AA3LWG6SEMMLU3F25ZANCNFSM5OLVD5LA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

apicici commented 2 years ago

I did some more digging, and:

MikuAuahDark commented 2 years ago

Based from investigation in LuaJIT issue tracker, a workaround is to prevent PipeWire from setting " zero denormals" with cpu.zero.denormals = false. However setting it to false may cause problems as mentioned in this PipeWire issue tracker https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/1681

I don't know if it should be LuaJIT fixes their JIT compiler, OpenAL-soft workarounds it in their code, or reported to PipeWire upstream. I'll keep this open until I have more information.

slime73 commented 2 years ago

I think this is almost definitely pipewire's bug - they shouldn't be disturbing global state like that.

thegrb93 commented 2 years ago

Interesting that a denormal setting is affecting luajit's ffi function calling. Seems like a luajit regression if it wasn't an issue before. I'd git bisect their commits and find what introduced the problem out of curiosity if I had a linux setup with pipewire.

slime73 commented 2 years ago

IMO it's not a bug for code to rely on standard floating point behaviour, even if the reliance is new. Pipewire's code changes floats to behave in a non-standard way.

thegrb93 commented 2 years ago

True, but the segmentation fault is luajit's fault

slime73 commented 2 years ago

It's in the same category as memory corruption in my opinion. Some code may crash due to other code writing to memory it doesn't own (and some code may crash due to other code changing the float mode for code it doesn't own), in which case it's not the crashing code's responsibility to prevent other code from messing with it.

For example I can crash love 100% of the time by creating C code which writes to certain memory addresses that have love-owned memory. It's not love's fault that it's crashing in that situation, it's mine.

thegrb93 commented 2 years ago

I don't think turning off denormals is as heinous. Imo, I would expect less accurate floats to not trigger a segfault.

thegrb93 commented 2 years ago

Considering it worked fine before, they must've overlooked something with a recent commit.

slime73 commented 2 years ago

I would expect any outcome including crashes, since the behaviour of floats is supposed to be well-defined, and it's valid for code to rely on that behaviour in arbitrary ways.

There's actually a history of this sort of issue (outside of pipewire). Direct3D 9 and older used to mess with float modes too, they added an API to turn that off since it ended up blowing up tons of codebases in subtle ways, and in D3D10 they abandoned it completely.

slime73 commented 2 years ago

Some updates from the pipewire issue report:

Since there are now fixes in pipewire for this, I think this issue can be closed.

thegrb93 commented 2 years ago

Also looking into the luajit issue, I learned it wasn't their fault since it was due to changing the flag while luajit was in mid operation. That I can understand causing problems.

MikuAuahDark commented 2 years ago

PipeWire 0.3.48 has been released to fix this issue. If you're on PipeWire version between 0.3.39 and 0.3.47 then please update your PipeWire when possible or force PulseAudio backend with ALSOFT_DRIVERS=pulse to workaround the issue.