Open AtlasLinden opened 1 month ago
I was able to repro without even switching to other apps, just poking around in the viewer for a few minutes.
I added code to include a stack trace in the message when any atexit() function fails with an error, producing:
2024-08-09T19:24:40Z WARNING #Lua# llcommon/lua_function.cpp(529) ~LuaState : atexit() function error: [string "/Users/nat/linden/viewer-lua-main/build-darwin-x86_64/newview/RelWithDebInfo/Second Life Test.app/Contents/Resources/scripts/lua/require/fiber.lua"]:328: stack overflow ((null))
[string "/Users/nat/linden/viewer-lua-main/build-darwin-x86_64/newview/RelWithDebInfo/Second Life Test.app/Contents/Resources/scripts/lua/require/fiber.lua"]:253 function scheduler
[string "/Users/nat/linden/viewer-lua-main/build-darwin-x86_64/newview/RelWithDebInfo/Second Life Test.app/Contents/Resources/scripts/lua/require/fiber.lua"]:328 function run
which is perplexing. I assumed the stack in question was the function-call stack, but with only those two entries, that seems highly unlikely.
fwiw fiber.lua line 253 (in function scheduler()) is the line:
others = next(ready)
which simply tests whether the ready
list is or is not empty. next()
is a Lua builtin that returns the first (key, value) pair if ready
isn't empty, or nil
if it is.
I'm perplexed as to how that could result in a data stack overflow.
Google searches for the "stack overflow ((null))" message have not yet been fruitful.
Poking around in the Luau source, I find the message here:
void luaL_checkstack(lua_State* L, int space, const char* mes)
{
if (!lua_checkstack(L, space))
luaL_error(L, "stack overflow (%s)", mes);
}
where the passed mes
is (as usual) nullptr
. lua_checkstack()
is here:
int lua_checkstack(lua_State* L, int size)
{
int res = 1;
if (size > LUAI_MAXCSTACK || (L->top - L->base + size) > LUAI_MAXCSTACK)
res = 0; // stack overflow
else if (size > 0)
{
luaD_checkstack(L, size);
expandstacklimit(L, L->top + size);
}
return res;
}
In a local build of 3p-luau, I added debugging output to lua_checkstack()
, producing this when the error occurs:
==================== Stack overflow
LUAI_MAXCSTACK = 8000
requested size = 1
stack span = 13686, stacksize = 13691
function base = 8, top = 7996
nested C calls = 1
==================== end stack overflow
I'm not sure how to diagnose where the problem is. Apparently some function is pushing 7988 stack items, such that requesting a single item more hits the overflow condition. (In my local edit, I subtracted 4 from the max in case it might be useful to call another Lua function (print()
?) to examine stack entries ... but I don't want to review almost 8000 of them.)
Passing __FUNCTION__
to luaL_checkstack()
actually identified at least the specific call that hits the error: it's in our helper function lua_rawgetfield()
. To simplify future debugging, I added a lluau_checkstack(L, n)
macro that implicitly passes __FUNCTION__
to the underlying luaL_checkstack()
function, and replaced all our luaL_checkstack(L, n, nullptr)
calls with lluau_checkstack(L, n)
.
As lua_rawgetfield()
is pretty recent, the only existing caller is lluau::check_interrupts_counter()
.
Knowing the location let me wrap the lua_rawgetfield()
call in try
/ catch
, with a statement in the catch
block on which I could set a breakpoint. When I hit that breakpoint, I was able to examine the C++ call stack.
To my surprise, the C++ call stack looked perfectly ordinary. I was expecting infinite recursion on somebody's function call stack, but neither Lua nor C++ report such.
Of course hitting this error in check_interrupts_counter()
, which is called every Lua engine "interrupt," makes one wonder if the problem might be that check_interrupts_counter()
is itself performing too many Lua operations, and it's being reentered by another interrupt. I tried setting and checking a recursion flag, such that check_interrupts_counter()
returns immediately if a previous instance hasn't yet returned. But there are relatively few ways to associate such a flag with a specific lua_State
; I chose to make another Registry entry. That only led to Lua data stack overflow trying to access the recursion flag.
Please test with this viewer build.
I didn't see an issue status meaning "ready for QA," but this is. It's not yet "integrated" because I still don't have approval for the bug-fix PR.
Environment
Second Life Release 7.1.9.10286089860 (64bit) Release Notes
You are at 121.0, 110.0, 23.1 in Pasta Cake located at simhost-0a7de8a08a5d18b0e.agni SLURL: http://maps.secondlife.com/secondlife/Pasta%20Cake/121/110/23 (global coordinates 332,153.0, 305,518.0, 23.1) Second Life Server 2024-06-11.9458617693 Release Notes
CPU: 12th Gen Intel(R) Core(TM) i9-12900HK (2918.4 MHz) Memory: 32440 MB OS Version: Microsoft Windows 10 64-bit (Build 19044.3086) Graphics Card Vendor: NVIDIA Corporation Graphics Card: NVIDIA GeForce RTX 3050 Ti Laptop GPU/PCIe/SSE2
Windows Graphics Driver Version: 31.0.15.4659 OpenGL Version: 4.6.0 NVIDIA 546.59
Window size: 1278x1368 Font Size Adjustment: 96pt UI Scaling: 1 Draw distance: 128m Bandwidth: 10000kbit/s LOD factor: 1.75 Render quality: 5 Texture memory: 4095MB Disk cache: Max size 1638.4 MB (36.6% used)
J2C Decoder Version: KDU v7.10.4 Audio Driver Version: FMOD Studio 2.02.20 Dullahan: 1.14.0.202310131404 CEF: 118.4.1+g3dd6078+chromium-118.0.5993.54 Chromium: 118.0.5993.54 LibVLC Version: 3.0.16 Voice Server Version: Vivox 4.10.0000.32327.5fc3fe7c.5942f08
Packets Lost: 0/1,148 (0.0%) August 09 2024 06:35:38
Description
This error was noticed in the logs after noticing a lua scripted floater stop updating.
WARNING #Lua# llcommon/lua_function.cpp(517) LuaState::~LuaState : atexit() function error: [string "C:\\Program Files\\SecondLifeLua\\scripts/lua/require\\fiber.lua"]:326: stack overflow ((null))
Reproduction steps
Not the most ideal steps but it does reproduce fairly consistently with the following: