Closed vcunat closed 5 years ago
What sort of variables can't be used? Should we be avoiding global vars? pointers to the stack? pointers as returned by malloc
?
Judging by the fact that it failed when requiring cqueues, it's probably pushing CQUEUE__POLL
in luaopen_cqueues
that is failing for you (though I'm sure once we get past this first one there will be more).
As I don't have access to an arm64 platform, could you verify the above by adding prints on either side of
lua_pushlightuserdata(L, CQUEUE__POLL);
Well, generally there's probably no guarantee which pointers get allocated too high. From aarch64 linux experience, stack is bad and static variables (inside C functions) are OK.
I believe you're right that's the first place to violate the condition:
0 lua_pushlightuserdata (L=0x252904278378, p=0xffffbf492038
) at lj_api.c:699
I think the memory layout ends up the same as on x86_64, only using (usually) 48 bits instead of 47. Notably I see that so-libs are placed on the high end – so the static
trick likely shouldn't work there, and I wouldn't expect malloc/mmap getting this high normally (especially when we're just one bit over the limit).
I'm not sure what to replace that code with.... a lightuserdata of cqueues_poll
is part of the public ABI of the library.
How else can you share a unique value across dlopen
d libraries?
Maybe for LuaJIT we should mask off the high bits? Would be a bit risky that a collision occurs.
Or maybe cqueues_poll
could be a struct with some luajit compat value in it? But who would initialise it and when?
None of the solutions I can think of seem particularly great :(
If it's just about collisions in the most common case when one bit is missing, I expect it's possible to hack around this without too much risk:
These hacks might even work luajit-wide... it's just about choosing a better part of the space to reject.
@vcunat thoughts on https://github.com/Kong/openresty-patches/pull/47/files to (mostly) fix things from the luajit side?
That sounds nice in principle. Still, any solution not accepted into luajit upstream might have a hard time getting into distributions, etc. I can't immediately see if this this couple lines fix everything; I don't know luajit internals.
@vcunat could you test out #225 for me?
Chiming in here and happy to help the best I can (at least from the point of view of having seen this general problem before, and having seen it solved a couple of times).
Closed via #225
Unfortunately, this error is back. Upstream luajit note: https://github.com/LuaJIT/LuaJIT/issues/49#issuecomment-668017303
For example, current Debian Sid:
# apt show lua-cqueues
Package: lua-cqueues
Version: 20200726-1
[...]
# luajit -l cqueues -e 'os.exit(0)'
luajit: bad light userdata pointer
stack traceback:
[C]: at 0xffff8c086120
[C]: in function 'require'
/usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
[C]: at 0xaaaad6b21d08
[C]: at 0xaaaad6ad44c0
@pspacek ^
Yes, let's continue on the newly open ticket.
Typical test case: aarch64 running Linux configured with
ARM64_VA_BITS >= 48
Other platforms don't commonly run into this AFAIK, but on aarch64 this kernel configuration now seems to be typical. Note that upstream luajit seems very unlikely to fix this incompatibility (and it would probably be hard without caveats).
I don't know code internals of cqueues, so I can't guess how hard this will be, but I recently did this transformation for Knot Resolver codebase, so I might be able to help, though I'm typically overloaded with higher-priority work.