Closed s-cerevisiae closed 15 hours ago
That definitely is a quandry. I've cross referenced our build script for Neovim against the ones used by Arch^1, Alpine^2, OpenBSD^3 and Void^4.
We aren't doing anything out of the ordinary for the build options.
Our Neovim is compiled with luajit
as the lua interpreter, but so are all of the other four.
The OpenBSD makefile even mentions cross compatibility with Lua 5.1 modules explicitly.
We do not package mlua
, so this issue slipped through the cracks.
If you could provide any additional help in identifying the root cause of the issue,
for example a minimal reproducible example of the issue,
that would be greatly appreciated.
@TomJo2000 Thanks for your quick reply! I've set up a minimal reproduction here, feel free to try it out.
This problem was initially discovered by trying to build native modules for a neovim plugin and use it on Termux (corresponding issue). It was later reported to the mlua repo but similarly the author didn't know what to do with it.
I've tried on a few different combinations of runtimes and platforms and finally the problem seems to only occur on Neovim + Termux. I don't know enough about Lua to write a module in C as proposed by the mlua author, but personally I don't think mlua is the faulty one (since it runs normally with luajit
).
My best guess is this is some sort of linker issue.
nm
shows a bunch of undefined symbols, most of them Lua related.
I'm guessing luajit
provides these in the way the module is expecting.
While nvim
does not.
// nm -uC ./target/debug/libreprod.so
U __cxa_atexit
U __cxa_finalize
U __errno
U __register_atfork
U __sF
U abort
U calloc
U clock_gettime
U close
U dl_iterate_phdr
U fflush
U fprintf
U free
U fstat
U fwrite
U getcwd
U getenv
U getpid
U lseek64
U luaL_callmeta
U luaL_error
U luaL_getmetafield
U luaL_ref
U lua_checkstack
U lua_close
U lua_concat
U lua_createtable
U lua_error
U lua_gc
U lua_getallocf
U lua_getinfo
U lua_getmetatable
U lua_getstack
// There's the symbol from the initial bug report
U lua_gettop
U lua_insert
U lua_isnumber
U lua_isstring
U lua_newthread
U lua_newuserdata
U lua_next
U lua_pcall
U lua_pushboolean
U lua_pushcclosure
U lua_pushfstring
U lua_pushinteger
U lua_pushlightuserdata
U lua_pushlstring
U lua_pushnil
U lua_pushnumber
U lua_pushthread
U lua_pushvalue
U lua_rawequal
U lua_rawget
U lua_rawset
U lua_remove
U lua_replace
U lua_setmetatable
U lua_settable
U lua_settop
U lua_toboolean
U lua_tolstring
U lua_tonumber
U lua_topointer
U lua_tothread
U lua_touserdata
U lua_type
U lua_typename
U lua_xmove
U malloc
U memcmp
U memcpy
U memmove
U memset
U mmap
U munmap
U open
U posix_memalign
U pthread_getspecific
U pthread_key_create
U pthread_key_delete
U pthread_rwlock_rdlock
U pthread_rwlock_unlock
U pthread_rwlock_wrlock
U pthread_setspecific
U read
U readlink
U realloc
U realpath
U sched_yield
U stat
U strerror_r
U strlen
U syscall
U write
U writev
I did also validate the module against Lua 5.1, which did work as expected.
I'll see if I can get any useful information out of strace
ing the ./with_neovim
example on my PC, which does work.
PC on the left. Termux on the right.
Here's the full text for both if you feel like dissecting them.
```styl wait4(-1, Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 52818 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7ca0a897e1d0}, {sa_handler=0x57d13c1d51e0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7ca0a897e1d0}, 8) = 0 ioctl(2, TIOCGWINSZ, {ws_row=58, ws_col=240, ws_xpixel=1920, ws_ypixel=1044}) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=52818, si_uid=1000, si_status=0, si_utime=1 /* 0.01 s */, si_stime=1 /* 0.01 s */} --- wait4(-1, 0x7ffc875f2290, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigreturn({mask=[]}) = 0 newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0755, st_size=240, ...}, 0) = 0 newfstatat(AT_FDCWD, "/usr/local/sbin/nvim", 0x7ffc875f2d40, 0) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/local/bin/nvim", 0x7ffc875f2d40, 0) = -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/usr/bin/nvim", {st_mode=S_IFREG|0755, st_size=4200472, ...}, 0) = 0 newfstatat(AT_FDCWD, "/usr/bin/nvim", {st_mode=S_IFREG|0755, st_size=4200472, ...}, 0) = 0 geteuid() = 1000 getegid() = 1000 getuid() = 1000 getgid() = 1000 access("/usr/bin/nvim", X_OK) = 0 newfstatat(AT_FDCWD, "/usr/bin/nvim", {st_mode=S_IFREG|0755, st_size=4200472, ...}, 0) = 0 geteuid() = 1000 getegid() = 1000 getuid() = 1000 getgid() = 1000 access("/usr/bin/nvim", R_OK) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [], 8) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ca0a88d0e50) = 52820 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {sa_handler=0x57d13c1d51e0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7ca0a897e1d0}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7ca0a897e1d0}, 8) = 0 wait4(-1, hello, world! [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 52820 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7ca0a897e1d0}, {sa_handler=0x57d13c1d51e0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7ca0a897e1d0}, 8) = 0 ioctl(2, TIOCGWINSZ, {ws_row=58, ws_col=240, ws_xpixel=1920, ws_ypixel=1044}) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=52820, si_uid=1000, si_status=0, si_utime=0, si_stime=0} --- wait4(-1, 0x7ffc875f2290, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigreturn({mask=[]}) = 0 read(255, "", 52) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 exit_group(0) = ? +++ exited with 0 +++ ```
```styl wait4(-1, Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 23667 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, {sa_handler=0x58f2994ecc, sa_mask=[], sa_flags=0}, 8) = 0 ioctl(2, TIOCGWINSZ, {ws_row=58, ws_col=240, ws_xpixel=1920, ws_ypixel=1044}) = 0 rt_sigprocmask(SIG_SETMASK, [RTMIN], [CHLD RTMIN], 8) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23667, si_uid=10215, si_status=0, si_utime=3 /* 0.03 s */, si_stime=1 /* 0.01 s */} --- wait4(-1, 0x7fcf9a6524, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigreturn({mask=[RTMIN]}) = 0 newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0700, st_size=3452, ...}, 0) = 0 newfstatat(AT_FDCWD, "/data/data/com.termux/files/usr/bin/nvim", {st_mode=S_IFREG|0700, st_size=4709192, ...}, 0) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD RTMIN], [RTMIN], 8) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7005443508) = 23671 rt_sigprocmask(SIG_SETMASK, [RTMIN], [INT TERM CHLD RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD RTMIN], [RTMIN], 8) = 0 rt_sigprocmask(SIG_SETMASK, [RTMIN], [CHLD RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD RTMIN], [RTMIN], 8) = 0 rt_sigaction(SIGINT, {sa_handler=0x58f2994ecc, sa_mask=[], sa_flags=0}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 wait4(-1, E5113: Error while calling lua chunk: error loading module 'reprod' from file './target/debug/libreprod.so': dlopen failed: cannot locate symbol "lua_type" referenced by "/data/data/com.termux/files/home/git/nvim-mlua-reprod/target/debug/libreprod.so"... stack traceback: [C]: at 0x7619b27dd4 [C]: in function 'require' load.lua:3: in main chunk [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 23671 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, {sa_handler=0x58f2994ecc, sa_mask=[], sa_flags=0}, 8) = 0 ioctl(2, TIOCGWINSZ, {ws_row=58, ws_col=240, ws_xpixel=1920, ws_ypixel=1044}) = 0 rt_sigprocmask(SIG_SETMASK, [RTMIN], [CHLD RTMIN], 8) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23671, si_uid=10215, si_status=1, si_utime=1 /* 0.01 s */, si_stime=1 /* 0.01 s */} --- wait4(-1, 0x7fcf9a6524, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigreturn({mask=[RTMIN]}) = 0 read(255, "", 60) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD RTMIN], [RTMIN], 8) = 0 rt_sigprocmask(SIG_SETMASK, [RTMIN], [CHLD RTMIN], 8) = 0 mprotect(0x70052e8000, 4096, PROT_READ|PROT_WRITE) = 0 mprotect(0x70052e8000, 4096, PROT_READ) = 0 exit_group(1) = ? +++ exited with 1 +++ ```
To state the obvious, Termux isn't making a access("/data/data/com.termux/files/usr/bin/nvim", X_OK)
, call
but that still doesn't explain why it isn't.
Does
patchelf --add-needed /data/data/com.termux/files/usr/lib/libluajit.so ./target/debug/libreprod.so
make things work for you?
As expected, yes it does. Since it patches in those symbols into the module directly.
I think this is a difference between Android and proper Linux linker/loaders
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libluajit.so nvim -l load.lua
also works here. I vaguely recall that the Android linker misbehaves when trying to resolve symbols exported by the executable itself, but I don't recall the details, so maybe that is what's going on here.
I think there are two workarounds:
DT_NEEDED
ELF entry for libluajit.so
. This would be traditional and I don't understand why it isn't being done, so there's probably a good reason. Worst case, we can use patchelf
...I think this is really in the category of "lua expects Linux behavior, Android deviates from it".
Okay, good to know that I'm not just missing something very obvious here.
I guess the questions that remain are;
mlua
, in which case this is a bug in mlua
.nvim
, in which case this is a bug in our Neovim package.These two aren't mutually exclusive.
It's probably not even fair to call this a bug in mlua
if it works fine in the typical desktop context.
Okay, good to know that I'm not just missing something very obvious here.
I guess the questions that remain are;
* Is this issue specific to SO modules produced by `mlua`, in which case this is a bug in `mlua`.
I'm afraid I don't know which other packages include shared libraries for use with lua, but it would be a good idea to find one and check whether there's a DT_NEEDED
entry in the ELF header.
* Does this effect linking against "off the shelf" shared libraries not directly linked against by `nvim`, in which case this is a bug in our Neovim package.
I don't think it's a bug in the neovim package.
These two aren't mutually exclusive.
It's probably not even fair to call this a bug in
mlua
if it works fine in the typical desktop context.
I think the issue can be worked around in mlua
, but I don't think it's fair to call it a bug, at this point.
This issue originates from https://github.com/android-ndk/ndk/issues/201, also stated in https://github.com/termux/termux-packages/wiki/Common-porting-problems. This not really specific to Termux but to Android OS in general.
Symbol visibility when opening shared libraries using dlopen() works differently. On a normal linker, when an executable linking against a shared library libA dlopen():s another shared library libB, the symbols of libA are exposed to libB without libB needing to link against libA explicitly. This does not work with the Android linker, which can break plug-in systems where the main executable dlopen():s a plug-in which does not explicitly link against some shared libraries already linked to by the executable.
So it's a bug in neither package and the symbols are still missing. Thanks Android.
This issue originates from android/ndk#201, also stated in https://github.com/termux/termux-packages/wiki/Common-porting-problems. This not really specific to Termux but to Android OS in general.
Is there anything we can do to manually "expose" the symbols?
Solution is simple: force link it with library providing necessary symbols.
This issue affects native extensions for all scripting languages (that's why for python often suggested to specify LDFLAGS="-lpython3.12"
before pip command).
Welp, guess you'll need to link the library or use @pipcet's patchelf
workaround.
Is there any specific rationale for why the Android linker limits symbol visibility like this? Or just vague "security reasons".
Solution is simple: force link it with library providing necessary symbols.
This issue affects native extensions for all scripting languages (that's why for python often suggested to specify
LDFLAGS="-lpython3.12"
before pip command).
Rust/Lua module for Android can be forcibly linked with LuaJIT by compiling with:
RUSTFLAGS="-C link-args=-L/path/to/lib -C link-args=-lluajit" cargo build
or setting corresponding options in .cargo/config.toml
.
Ok fine, even this exact problem has been reported before (#6383) and I've only found it through the ndk issue.
Thanks for your effort, maybe this can be closed as duplicate now. I'm glad that there are several workarounds to this problem and I'll try to find which one is feasible for cross compiling in a CI (as requested in https://github.com/Saghen/blink.cmp/issues/145)
I'd like to ask if it's a good idea (or even permitted) to download the libluajit.so artifact from Termux in CI? If not, what should be the best practice to link the library with it?
https://github.com/mlua-rs/mlua/blob/4891a6ac10e152625073335ad0703a6e68aa36fc/mlua-sys/build/main_inner.rs#L33-L34
It seems that by enabling feature module
, this stops adding libluajit.so
as DT_NEEDED
.
Removing the #[cfg...
line will add as DT_NEEDED
successfully.
by enabling feature
module
this stops addinglibluajit.so
asDT_NEEDED
.
Yes, that's the intended behavior of module
feature. It ships lua headers in that mode and assumes related symbols are available at runtime, so that the resulting module doesn't depend on a particular lua runtime.
So I'd like to know if I choose to link the module to luajit instead (in any of the ways suggested above), is there a good way to build it in CI environment? Do I need to get a copy of libluajit.so
from Termux repository?
You can build by cloning this repo, ./scripts/run-docker.sh ./build-package.sh libluajit
.
A .deb
file in output folder will be generated.
But you are basically tying yourself to Termux.
Yes, that's why I'm generally against distributing binaries. Maybe it makes most sense to set rustflags in cargo config and don't provide CI so that people can build on their own devices? It still needs RUSTC_BOOTSTRAP
since the project is using unstable features and Termux does not provide nightly toolchains though...
Termux does not provide nightly toolchains though...
There is https://github.com/termux-user-repository/tur/tree/master/tur/rustc-nightly
Install via pkg install tur-repo
, pkg install rustc-nightly
So I think the best available workaround is to make nvim
a wrapper which appends "libluajit.so" to LD_PRELOAD
, then calls the real nvim
. Calling dlopen("libluajit.so", RTLD_GLOBAL|RTLD_NOW)
from neovim doesn't appear to fix the problem, probably due to deliberate Android breakage.
Problem description
Neovim is unable to load lua modules created with mlua if the module is not specifically linked with LuaJIT.
require
-ing the module throws an error saying that dlopen cannot locatelua_gettop
(or some other lua symbols).This doesn't happen when the same code is run with
luajit
instead ofnvim
.What steps will reproduce the bug?
"luajit"
)cargo build
cd
into./target/debug
ln -s lib*.so my_module.so
nvim --clean --cmd "lua require('my_module')"
What is the expected behavior?
The module is loaded and the function is accessible.
If the library is instead built with
RUSTFLAGS="-C link-args=-lluajit" cargo build
it works as expected, but this is not required on any other platforms and it makes cross compiling very difficult.System information