wahern / cqueues

Continuation Queues: Embeddable asynchronous networking, threading, and notification framework for Lua on Unix.
http://25thandclement.com/~william/projects/cqueues.html
MIT License
251 stars 37 forks source link

luajit light userdata incompatibility #223

Closed vcunat closed 5 years ago

vcunat commented 5 years ago

Typical test case: aarch64 running Linux configured with ARM64_VA_BITS >= 48

$ luajit -l cqueues -e 'os.exit(0)'
luajit: bad light userdata pointer

Other platforms don't commonly run into this AFAIK, but on aarch64 this kernel configuration now seems to be typical. Note that upstream luajit seems very unlikely to fix this incompatibility (and it would probably be hard without caveats).

I don't know code internals of cqueues, so I can't guess how hard this will be, but I recently did this transformation for Knot Resolver codebase, so I might be able to help, though I'm typically overloaded with higher-priority work.

daurnimator commented 5 years ago

What sort of variables can't be used? Should we be avoiding global vars? pointers to the stack? pointers as returned by malloc?

daurnimator commented 5 years ago

Judging by the fact that it failed when requiring cqueues, it's probably pushing CQUEUE__POLL in luaopen_cqueues that is failing for you (though I'm sure once we get past this first one there will be more).

As I don't have access to an arm64 platform, could you verify the above by adding prints on either side of

lua_pushlightuserdata(L, CQUEUE__POLL);
vcunat commented 5 years ago

Well, generally there's probably no guarantee which pointers get allocated too high. From aarch64 linux experience, stack is bad and static variables (inside C functions) are OK.

I believe you're right that's the first place to violate the condition:

0 lua_pushlightuserdata (L=0x252904278378, p=0xffffbf492038 ) at lj_api.c:699

vcunat commented 5 years ago

I think the memory layout ends up the same as on x86_64, only using (usually) 48 bits instead of 47. Notably I see that so-libs are placed on the high end – so the static trick likely shouldn't work there, and I wouldn't expect malloc/mmap getting this high normally (especially when we're just one bit over the limit).

daurnimator commented 5 years ago

I'm not sure what to replace that code with.... a lightuserdata of cqueues_poll is part of the public ABI of the library. How else can you share a unique value across dlopend libraries?

Maybe for LuaJIT we should mask off the high bits? Would be a bit risky that a collision occurs. Or maybe cqueues_poll could be a struct with some luajit compat value in it? But who would initialise it and when? None of the solutions I can think of seem particularly great :(

vcunat commented 5 years ago

If it's just about collisions in the most common case when one bit is missing, I expect it's possible to hack around this without too much risk:

  1. sacrifice the least significant bit instead the bit 47 (i.e. 1 << 47), as existence of valid distinct addresses one byte off is practically impossible (and almost all allocators would guarantee the bit to be zero anyway), or
  2. assume sign-extension, i.e. that bit 47 and 46 have the same value... as allocations tend to keep by either of the extremes.

These hacks might even work luajit-wide... it's just about choosing a better part of the space to reject.

daurnimator commented 5 years ago

@vcunat thoughts on https://github.com/Kong/openresty-patches/pull/47/files to (mostly) fix things from the luajit side?

vcunat commented 5 years ago

That sounds nice in principle. Still, any solution not accepted into luajit upstream might have a hard time getting into distributions, etc. I can't immediately see if this this couple lines fix everything; I don't know luajit internals.

daurnimator commented 5 years ago

@vcunat could you test out #225 for me?

vielmetti commented 5 years ago

Chiming in here and happy to help the best I can (at least from the point of view of having seen this general problem before, and having seen it solved a couple of times).

daurnimator commented 5 years ago

Closed via #225

vcunat commented 4 years ago

Unfortunately, this error is back. Upstream luajit note: https://github.com/LuaJIT/LuaJIT/issues/49#issuecomment-668017303

For example, current Debian Sid:

# apt show lua-cqueues
Package: lua-cqueues
Version: 20200726-1
[...]

# luajit -l cqueues -e 'os.exit(0)'
luajit: bad light userdata pointer
stack traceback:
        [C]: at 0xffff8c086120
        [C]: in function 'require'
        /usr/share/lua/5.1/cqueues.lua:2: in function </usr/share/lua/5.1/cqueues.lua:1>
        [C]: at 0xaaaad6b21d08
        [C]: at 0xaaaad6ad44c0
daurnimator commented 4 years ago

@pspacek ^

@vcunat https://github.com/wahern/cqueues/issues/241

vcunat commented 4 years ago

Yes, let's continue on the newly open ticket.