otland / forgottenserver

A free and open-source MMORPG server emulator written in C++
https://otland.net
GNU General Public License v2.0
1.59k stars 1.06k forks source link

NPC's have a (Lua) memory leak & crash potential #3552

Open Erza opened 3 years ago

Erza commented 3 years ago

Before creating an issue, please ensure:

Steps to reproduce (include any configuration/script required to reproduce)

  1. Stick this code in a talkaction
    
    local tests = {
    npc = 1,
    monster = 2,
    item = 3,
    }

local activeTest = tests.npc

local monsterName = "Enslaved Dwarf" local npcName = "Banker" local itemId = 2148

local function runTest(playerPos) for i = 1, 5000 do if activeTest == tests.npc then local npc = Game.createNpc(npcName, playerPos) npc:remove() elseif activeTest == tests.monster then local monster = Game.createMonster(monsterName, playerPos) monster:remove() elseif activeTest == tests.item then local item = Game.createItem(itemId, 100, playerPos) item:remove() end end end

function onSay(player, words, param) local playerPos = player:getPosition()

local start = os.clock()
runTest(playerPos)
local reloadStart = os.clock()
Game.reload(RELOAD_TYPE_TALKACTIONS)
local testEnd = os.clock()
print("Test #1: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

start = os.clock()
runTest(playerPos)
reloadStart = os.clock()
Game.reload(RELOAD_TYPE_TALKACTIONS)
testEnd = os.clock()
print("Test #2: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

start = os.clock()
runTest(playerPos)
reloadStart = os.clock()
Game.reload(RELOAD_TYPE_TALKACTIONS)
testEnd = os.clock()
print("Test #3: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

playerPos:sendMagicEffect(CONST_ME_BLOCKHIT)
print(("-"):rep(50))
return false

end

2. Run the talkaction a few times and keep an eye on the memory usage of the process

### Expected behaviour
NPC's cleanly deallocate their memory upon being removed

### Actual behaviour
NPC's do not deallocate their memory upon being removed

### Observations
1. After commenting out this call to `lua_gc`, reload times stay consistent, which gives evidence of the Lua environment getting trashed https://github.com/otland/forgottenserver/blob/e1e1bc21a72034cdc57cfbc95701182943307471/src/luascript.cpp#L4604
2. Using `local activeTest = tests.monster` or `local activeTest = tests.item` does not produce the same memory leak
3. It doesn't matter which reload type we use, whether it's `RELOAD_TYPE_TALKACTIONS`, `RELOAD_TYPE_CREATURESCRIPTS` or `RELOAD_TYPE_ACTIONS` etc, the entire Lua environment is affected

### Test Outputs
##### With the call to `lua_gc` enabled (increasing timings)

Test #1: Test: 3.841 - Reload: 0.059000000000001 - Total: 3.9 Test #2: Test: 3.913 - Reload: 0.11 - Total: 4.023 Test #3: Test: 3.833 - Reload: 0.167 - Total: 4

Test #1: Test: 3.832 - Reload: 0.221 - Total: 4.053 Test #2: Test: 3.841 - Reload: 0.274 - Total: 4.115 Test #3: Test: 3.896 - Reload: 0.33 - Total: 4.226

Test #1: Test: 3.897 - Reload: 0.383 - Total: 4.28 Test #2: Test: 3.916 - Reload: 0.439 - Total: 4.355 Test #3: Test: 3.934 - Reload: 0.492 - Total: 4.426

Test #1: Test: 3.954 - Reload: 0.547 - Total: 4.501 Test #2: Test: 3.971 - Reload: 0.594 - Total: 4.565 Test #3: Test: 3.988 - Reload: 0.657 - Total: 4.645

Test #1: Test: 4.006 - Reload: 0.706 - Total: 4.712 Test #2: Test: 3.993 - Reload: 0.76400000000001 - Total: 4.757 Test #3: Test: 4.019 - Reload: 0.815 - Total: 4.834

Test #1: Test: 4.017 - Reload: 0.87 - Total: 4.887 Test #2: Test: 4.017 - Reload: 0.929 - Total: 4.946 Test #3: Test: 4.061 - Reload: 0.982 - Total: 5.043

##### With the call to `lua_gc` disabled (consistent timings)

Test #1: Test: 3.875 - Reload: 0.0040000000000013 - Total: 3.879 Test #2: Test: 3.921 - Reload: 0.0040000000000013 - Total: 3.925 Test #3: Test: 3.911 - Reload: 0.0040000000000013 - Total: 3.915

Test #1: Test: 4.046 - Reload: 0.0030000000000001 - Total: 4.049 Test #2: Test: 3.844 - Reload: 0.0040000000000013 - Total: 3.848 Test #3: Test: 3.967 - Reload: 0.0039999999999978 - Total: 3.971

Test #1: Test: 4.112 - Reload: 0.0049999999999955 - Total: 4.117 Test #2: Test: 4.124 - Reload: 0.0039999999999978 - Total: 4.128 Test #3: Test: 3.94 - Reload: 0.0039999999999978 - Total: 3.944

Test #1: Test: 3.974 - Reload: 0.0040000000000049 - Total: 3.978 Test #2: Test: 4.007 - Reload: 0.0039999999999978 - Total: 4.011 Test #3: Test: 4.217 - Reload: 0.0049999999999955 - Total: 4.222

Test #1: Test: 4.528 - Reload: 0.0049999999999955 - Total: 4.533 Test #2: Test: 4.186 - Reload: 0.0039999999999907 - Total: 4.19 Test #3: Test: 4.046 - Reload: 0.0030000000000001 - Total: 4.049

Test #1: Test: 4.108 - Reload: 0.0039999999999907 - Total: 4.112 Test #2: Test: 4.33 - Reload: 0.0040000000000049 - Total: 4.334 [Warning - NpcScript::NpcScript] Can not load script: bank.lua not enough memory


### Memory Graphs
#### NPC Test
![NPC Test](https://i.imgur.com/IOZ26hd.png)

#### Monster Test
![Monster Test](https://i.imgur.com/ILZpyml.png)

#### Item Test
![Item Test](https://i.imgur.com/LSZxEQx.png)

### Impact on the server
Eventually, the Lua environment will run out of available memory to allocate and you will get the following error, after which the server will shortly crash

[Warning - NpcScript::NpcScript] Can not load script: bank.lua not enough memory

nekiro commented 3 years ago

Probably the npc system is at fault (I think it's known from having memory leaks), if you empty the lua files to bare minimum is it still leaking? Or if you use another npc.

Erza commented 3 years ago

For all of these tests I was using 33333 iterations per test, to get to a total of 100k spawns

local function runTest(playerPos)
    for i = 1, 33333 do
        if activeTest == tests.npc then
            local npc = Game.createNpc(npcName, playerPos)
            npc:remove()
        elseif activeTest == tests.monster then
            local monster = Game.createMonster(monsterName, playerPos)
            monster:remove()
        elseif activeTest == tests.item then
            local item = Game.createItem(itemId, 100, playerPos)
            item:remove()
        end
    end
end

Results

The first dip is always after the talkaction finished executing, the 2nd dip is always after manually running /reload npc after execution finishes

Test 1 (Removing the script="bank.lua" tag from the Banker.xml file entirely

Looks like we got perfect memory deallocation in this case

Test 2 (Bare minimum NPC script file)

It's seemingly less with an "empty" NPC script file, but I can still observe a leak coming from data/npc/lib/npc.lua (more specifically, data/npc/lib/npcsystem.lua)

Test 3 (Adding KeywordHandler)

Test 4 (Adding NpcHandler)

Test 5 (Parsing parameters & enabling NpcHandler callbacks)

Test 6 (Adding FocusModule)

Test 7 (Adding a few keywords to listen for)

Zbizu commented 1 year ago

There's some exception with code 0xE24C4A02 spamming the log when running a debugger it's being launched from this line (and has npc onThink in stacktrace): https://github.com/otland/forgottenserver/blob/0e20d54eebc6be8ea7ed50d0fd3d31d5d43df1f9/src/luascript.cpp#L300

could someone confirm this with unmodded tfs?

EPuncker commented 1 year ago

https://github.com/otland/forgottenserver/pull/4483