Open pspacek opened 4 years ago
Oh, I forgot to copy&paste the error itself!
/usr/bin/luajit: bad light userdata pointer
stack traceback:
[C]: at 0xffffb6342ad0
[C]: in function 'require'
/usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
[C]: at 0xaaaae1757d08
[C]: at 0xaaaae170a4c0
Can you confirm that CQS_USE_47BIT_LIGHTUSERDATA_HACK
is being set?
Could you provide a C stack trace?
Debian doesn't provide -dbgsym for this platform (no idea why). When I build it myself from master on the same machine, the error disappears :-/
Looks like your debian build could be using an old release:
Get:110 https://deb.debian.org/debian unstable/main arm64 lua-cqueues arm64 20190813-1 [179 kB]
No, that cqueues version had this fixed already. And my testing was with this anyway:
# apt show lua-cqueues
Package: lua-cqueues
Version: 20200726-1
[...]
I'm not close to this project, but have some familiarity with the LuaJIT story on AArch64. MoonJIT has an active AArch64 port - and may be an alternative?
This is the same with moonjit 2.1.2.
I guess this isn't enough "proof"?
(gdb) bt
#0 lj_err_msg (L=0x18255adee378, em=em@entry=LJ_ERR_BADLU) at lj_err.c:645
#1 0x0000aaaaaab00180 in lua_pushlightuserdata (L=<optimized out>, p=<optimized out>) at lj_state.c:121
#2 0x0000fffff7d29378 in luaopen.cqueues () from /usr/local/lib/lua/5.1/_cqueues.so
@vcunat recompile cqueues with -O0
; I want to see the exact line in _cqueues.so
that is causing the failure.
As I wrote, I can't reproduce the failure when recompiling by hand (so far). Maybe the address layout is sensitive to something that I don't know about.
Line isn't surprising: https://salsa.debian.org/lua-team/lua-cqueues/-/blob/2eb74cf0/src/cqueues.c#L2940
It's the defined(LUA_JITLIBNAME)
in the condition. Debian naturally assumes that packages built against vanilla lua 5.1 will be usable with luajit, moonjit, etc.
For reference, luaossl suffers from the very same (except its version is ancient on Debian anyway).
Debian naturally assumes that packages built against vanilla lua 5.1 will be usable with luajit, moonjit, etc.
sadly that isn't true for 64bit platforms. if they want to share the same build for both lua 5.1 and LuaJIT then they'll need to compile with CQS_USE_47BIT_LIGHTUSERDATA_HACK
defined.
@yac Please have a look once you get to Debian packaging.
@daurnimator Do you see any reason why not enable CQS_USE_47BIT_LIGHTUSERDATA_HACK
on all platforms? Or at least on all 64 bit platforms? Does it have any adverse effects?
Then it might be worth fixing debian/rules in this repo as well, though I don't know if anyone uses it.
@daurnimator Do you see any reason why not enable
CQS_USE_47BIT_LIGHTUSERDATA_HACK
on all platforms? Or at least on all 64 bit platforms? Does it have any adverse effects?
Yes: it could result in mysterious errors/failures/undefined behaviour if you manage to map memory with the same trailing 47 bits.
Hmm, that sounds like a good reason not to enable it everywhere :-(
If the collision risk is notable and assuming we're OK with focusing on ==48 bits, I'd suggest using right shift by one bit instead of that mask. In most use cases you know for certain that you won't use the hack on two addresses one byte apart.
If the collision risk is notable and assuming we're OK with focusing on ==48 bits, I'd suggest using right shift by one bit instead of that mask. In most use cases you know for certain that you won't use the hack on two addresses one byte apart.
One idea I had was to make the contents of the pointer aligned to e.g. 32bytes, so that we can take the pointer of the end of it, and shift right by 5 (and then use the same 47-bit mask). Does e.g. aarch64 on debian ever set the top 12 bits of pointers?
Well... to get more confidence, I'd look at ARMv8 manual – page 2083 describes what the virtual addresses can look like. If I understand it right, "negative" values are allowed, and thus the total possible space is at most 2x 48 bits or 2x 52 bits.
Say we want to focus on 2x 48 only (for now). Right shift by two bits should give us unique values even after masking to 47 bits (assuming no pointers get less than four bytes apart). It's even possible to check whether the passed pointer fits into the 2x48 range (say, in assert
mode at least).
I'm not sure if 64-byte alignment would be realistic in cqueues case, to allow using the same approach even for 2x52. As for current practice, I don't expect the 52 variant is common yet. I haven't seen a "negative" pointer yet but I haven't really looked so far (maybe they're not useful in user-space).
cqueues on Debian on aarch64 explodes with error:
Full build log: https://buildd.debian.org/status/fetch.php?pkg=knot-resolver&arch=arm64&ver=5.1.2-1&stamp=1596037546&raw=0
This is probably related to some LuaJIT limitations on aarch64 but it effectivelly makes cqueues unusable for applications where performance matters.