I'm then attaching gdb to the pid to capture a backtrace:
Program received signal SIGBUS, Bus error.
__GI_getenv (name=0x1004d9a "P_PATH") at getenv.c:90
90 getenv.c: No such file or directory.
(gdb) bt
#0 __GI_getenv (name=0x1004d9a "P_PATH") at getenv.c:90
#1 0x00007f9c09dda4ee in lfcgi_getenv (L=0xf9bce0) at src/fastcgi/lfcgi.c:550
#2 0x0000000000408164 in ?? ()
#3 0x0000000000411558 in ?? ()
#4 0x00000000004085bd in ?? ()
#5 0x000000000040fd46 in ?? ()
#6 0x0000000000411139 in ?? ()
#7 0x00000000004085bd in ?? ()
#8 0x000000000040780a in ?? ()
#9 0x000000000040874f in ?? ()
#10 0x0000000000405e0f in lua_pcall ()
#11 0x00000000004168dc in ?? ()
#12 0x0000000000408164 in ?? ()
#13 0x0000000000411453 in ?? ()
#14 0x00000000004085bd in ?? ()
#15 0x0000000000405d76 in lua_call ()
#16 0x0000000000417367 in ?? ()
#17 0x0000000000408164 in ?? ()
#18 0x0000000000411558 in ?? ()
#19 0x00000000004085bd in ?? ()
#20 0x000000000040780a in ?? ()
#21 0x000000000040874f in ?? ()
#22 0x0000000000405e0f in lua_pcall ()
#23 0x00007f9c09fde540 in dostring () from /usr/local/lib/lua/5.1/rings.so
#24 0x0000000000408164 in ?? ()
#25 0x0000000000411558 in ?? ()
#26 0x00000000004085bd in ?? ()
#27 0x000000000040780a in ?? ()
#28 0x000000000040874f in ?? ()
#29 0x0000000000405e0f in lua_pcall ()
#30 0x00000000004168dc in ?? ()
#31 0x0000000000408164 in ?? ()
#32 0x0000000000411453 in ?? ()
#33 0x00000000004085bd in ?? ()
#34 0x000000000040780a in ?? ()
#35 0x000000000040874f in ?? ()
#36 0x0000000000405e0f in lua_pcall ()
#37 0x0000000000403f36 in _start ()
(gdb) f 1
#1 0x00007f9c09dda4ee in lfcgi_getenv (L=0xf9bce0) at src/fastcgi/lfcgi.c:550
550 val = getenv(envVar);
(gdb) print old_env
$1 = (char **) 0xf98930
(gdb) print environ
$2 = (char **) 0xf98930
(gdb) print env
$3 = (char **) 0x11822c0
(gdb) print envVar
$4 = 0x1004d98 "APP_PATH"
What I'm seeing is the issue is related to the environment switching to the old environ and then calling getenv. It looks like the memory pointed to old_env is being modified externally and then old_env is pointing to an invalid location.
I believe this is the issue because in the above gdb output the envVar is pointing to the string "APP_PATH" but the getenv name is showing "P_PATH".
I was able to avoid/fix the issue by removing the env swapping in lfcgi_getenv. After making that change I'm no longer able to reproduce the crash. However, I don't know if that is an adequate solution. I'm not entirely sure if storing and reusing the old env is necessary/required.
A few possible solutions I can see:
Remove the old env swapping entirely.
Duplicate the old env and not rely on the old env pointer to be valid.
Use FCGX so the script environment is passed instead of relying on getenv. This should allow getenv to get the old_env (at the state it is in at the time of the call).
So the question becomes what's the best solution for this issue?
I'm using Lua 5.1 but the issue also happens with Lua 5.2. I'm also using version 1.6 of wsapi.
Dmesg is showing the following:
[861184.481468] lua[10912] trap stack segment ip:7f3c8b1cbf5c sp:7fff9fa26d00 error:0
I'm starting the fastcgi server using:
spawn-fcgi -p 9000 -g www-data -u www-data -- /usr/local/bin/wsapi.fcgi
I'm then attaching gdb to the pid to capture a backtrace:
What I'm seeing is the issue is related to the environment switching to the old environ and then calling getenv. It looks like the memory pointed to old_env is being modified externally and then old_env is pointing to an invalid location.
I believe this is the issue because in the above gdb output the envVar is pointing to the string "APP_PATH" but the getenv name is showing "P_PATH".
I was able to avoid/fix the issue by removing the env swapping in lfcgi_getenv. After making that change I'm no longer able to reproduce the crash. However, I don't know if that is an adequate solution. I'm not entirely sure if storing and reusing the old env is necessary/required.
A few possible solutions I can see:
So the question becomes what's the best solution for this issue?