kernelsauce / turbo

Turbo is a framework built for LuaJIT 2 to simplify the task of building fast and scalable network applications. It uses a event-driven, non-blocking, no thread design to deliver excellent performance and minimal footprint to high-load applications while also providing excellent support for embedded uses.
http://turbo.readthedocs.io/
Apache License 2.0
525 stars 84 forks source link

VMRSS increases with the number of Requests being issued to the Turbo Lua Server, causing memory leak. #350

Open GINUGEORGE opened 4 years ago

GINUGEORGE commented 4 years ago

@kernelsauce Could you please explain whether the behaviour mentioned below is expected:

I am having a simple application , the source code of which is as mentioned below -

local turbo = require("turbo")
local HelloNameHandler = class("HelloNameHandler", turbo.web.RequestHandler)

function HelloNameHandler:get()
     -- Get the 'name' argument, or use 'Santa Claus' if it does not exist
     local name = self:get_argument("name", "Santa Claus")
     self:write("Hello " .. name .. "!")
     self:finish()
end

function HelloNameHandler:post()
    -- Get the 'name' argument, or use 'Easter Bunny' if it does not exist
    local name = self:get_argument("name", "Easter Bunny")
    self:write("Hello " .. name .. "!")
end

local app = turbo.web.Application:new({
    {"/hello", HelloNameHandler}
})
function HelloNameHandler:on_finish()
    collectgarbage("collect")
end

app:listen(9080)
turbo.ioloop.instance():start()

I am running the application

# luajit server.lua &
[1] 11322

Post starting the application, I am checking the VMRSS as below

# cat /proc/11322/status | grep -i vm*
VmPeak:    26228 kB
VmSize:    26228 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      4140 kB
VmRSS:      4140 kB
VmData:     1180 kB
VmStk:       132 kB
VmExe:       440 kB
VmLib:      5784 kB
VmPTE:       100 kB
VmPMD:        16 kB
VmSwap:        0 kB
Speculation_Store_Bypass:       thread vulnerable
voluntary_ctxt_switches:        48
nonvoluntary_ctxt_switches:     1

Now I am trying to trigger 2000 requests to the server with concurrency level as 1.

# ab -k -n 2000 -c 1 http://<<Machine-IP>>:9080/hello

After the requests are finished triggering, I am checking the VMRSS again as below

# cat /proc/11322/status | grep -i vm*
VmPeak:    26292 kB
VmSize:    26292 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      4612 kB
VmRSS:      4612 kB
VmData:     1244 kB
VmStk:       132 kB
VmExe:       440 kB
VmLib:      5784 kB
VmPTE:       100 kB
VmPMD:        16 kB
VmSwap:        0 kB
Speculation_Store_Bypass:       thread vulnerable
voluntary_ctxt_switches:        1984
nonvoluntary_ctxt_switches:     8

What we can find is the VMRSS has increased from 4140 KB to 4612 KB. If we keep on issuing requests to the server in similar manner, VMRSS keeps on increasing.

Where does the memory leak by the way ? Why does VMRSS keep on increasing ?

hippi777 commented 4 years ago

hi there! :)

you could try to test it with jit off or just flush it, as functions that use upvalues and are jit compiled wont let the function with its upvalues be colleted as far as they exists, but localization of the upvalues would solve it... so 1st of all we should always be sure that its not just the jit that makes some nasty stuffs, and that can narrow down the possibilities with much...

(btw this was just a wild guess :D )

bests! :)

GINUGEORGE commented 4 years ago

@szbnwer I am finding it a bit difficult to understand what you are trying to convene.

Could you please explain these lines :

  1. test it with jit off or just flush it
  2. localization of the upvalues
hippi777 commented 4 years ago

hi! :)

sorry, im known to be an alien, so its my fault.... :D

i just tried to write it in more details, that could be more confusing, but initially i wrote my previous message in a hard-to-understand manner, because my understanding about this is also not the best, and i dint want to say actual bulls#it... so i just deleted my current message right after i found the original stuff :D (and now i will re-read that topic just as well :D )

here u go for the professional version of the same info: https://www.freelists.org/post/luajit/LuaJIT-GC-Problem

there is a malformed code in the mail above, so i made a copy from my inbox to pastebin for better readability: https://pastebin.com/Jy7ELdiN

good luck, i hope it will be something useful! :)

GINUGEORGE commented 4 years ago

@kernelsauce @hippi777 As suggested by you, I have tried adding jit.off() and jit.flush() methods as mentioned below:

local turbo = require("turbo")
local HelloNameHandler = class("HelloNameHandler", turbo.web.RequestHandler)
local jit_compiler_disable_flag = false

function HelloNameHandler:get()
     -- Get the 'name' argument, or use 'Santa Claus' if it does not exist
     local name = self:get_argument("name", "Santa Claus")
     self:write("Hello " .. name .. "!")
     self:finish()
end

function HelloNameHandler:post()
    -- Get the 'name' argument, or use 'Easter Bunny' if it does not exist
    local name = self:get_argument("name", "Easter Bunny")
    self:write("Hello " .. name .. "!")
end

local app = turbo.web.Application:new({
    {"/hello", HelloNameHandler}
})
function HelloNameHandler:on_finish()
        if jit_compiler_disable_flag ~= true then
        print("Disabled JIT Compiler")
        jit_compiler_disable_flag = true
        jit.off() -- Should be called just one time
    end
    jit.flush() -- Can be called at end of every request
    collectgarbage("collect")
end

app:listen(9080)
turbo.ioloop.instance():start()

I have bench marked the same and I have found out that, though the memory leak has reduced a bit after adding jit.off() and jit.flush() methods, but still we have considerable memory leak, This memory leak may be bound to increase with large scale applications.

Let me share with you the bench marking results :

I am running the application

# luajit server.lua &
[1] 30495

Post starting the application, I am checking the VMRSS as below

# cat /proc/30495/status | grep -i vm*
VmPeak:    26228 kB
VmSize:    26228 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      4064 kB
VmRSS:      4064 kB
VmData:     1212 kB
VmStk:       132 kB
VmExe:       440 kB
VmLib:      5856 kB
VmPTE:       136 kB
VmSwap:        0 kB
NoNewPrivs:     0

Now I am trying to trigger 2000 requests to the server with concurrency level as 1.

# ab -k -n 2000 -c 1 http://<<Machine-IP>>:9080/hello

After the requests are finished triggering, I am checking the VMRSS again as below and it has increased

# cat /proc/30495/status | grep -i vm*
VmPeak:    26228 kB
VmSize:    26164 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      4380 kB
VmRSS:      4380 kB
VmData:     1212 kB
VmStk:       132 kB
VmExe:       440 kB
VmLib:      5792 kB
VmPTE:       132 kB
VmSwap:        0 kB
NoNewPrivs:     0
Speculation_Store_Bypass:       thread vulnerable
voluntary_ctxt_switches:        89
nonvoluntary_ctxt_switches:     2

Now I am trying to trigger 2000 requests to the server with concurrency level as 100.

# ab -k -n 2000 -c 100 http://<<Machine-IP>>:9080/hello

I am checking the VMRSS again as below and it has increased further

root@redfish-host-d3h:~# cat /proc/30495/status | grep -i vm*
VmPeak:    26812 kB
VmSize:    26812 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      4904 kB
VmRSS:      4904 kB
VmData:     1860 kB
VmStk:       132 kB
VmExe:       440 kB
VmLib:      5792 kB
VmPTE:       136 kB
VmSwap:        0 kB
NoNewPrivs:     0
Speculation_Store_Bypass:       thread vulnerable
voluntary_ctxt_switches:        141
nonvoluntary_ctxt_switches:     5

Now I am trying to trigger 2000 requests to the server with concurrency level as 500.

# ab -k -n 2000 -c 500 http://<<Machine-IP>>:9080/hello

I am checking the VMRSS again as below and it has increased by a huge margin

# cat /proc/30495/status | grep -i vm*
VmPeak:    31076 kB
VmSize:    31076 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      9268 kB
VmRSS:      9268 kB
VmData:     6124 kB
VmStk:       132 kB
VmExe:       440 kB
VmLib:      5792 kB
VmPTE:       164 kB
VmSwap:        0 kB
NoNewPrivs:     0
Speculation_Store_Bypass:       thread vulnerable
voluntary_ctxt_switches:        238
nonvoluntary_ctxt_switches:     10

Why is the VMRSS still increasing even after adding jit.flush() and collectgarbage() methods and why does it increase hugely when issuing concurrent requests ?

Please do let know in case you have any suggestions/inputs.

hippi777 commented 4 years ago

hi there! :)

i think u should b right about this, as i couldnt even think about a better test...

my 2nd guess was that tables have the behavior that they allocate either or both the array and hash part to hold the next 2^n items when they fill up. maybe its still just a fals positive as it leaves some garbage, and make more when it has a more heavy duty, and maybe it still has a top, but i think u have made more tests, so this is a really weak guess only :D

what about asking the gc after the ioloop exits, and all are cleaned up just before?

if the latter test fails the same way, then u can even destroy the turbo ref and your functions, so we can see if its about luajit or about turbo... if turbo stores anything, then it will store that under its own namespace if there isnt a missing local somewhere or a misconception or whatever, but u can still check the global namespace if its identical after sacrificing turbo to the gc :D

also, do u use git head from luajit? it still has recent issues, but much of those older ones are resolved, and Mike Pall neither will care about resolved stuffs if its about luajit...

if u cant find anything wrong about luajit, then maybe u could still ask on its mailing list for advice about where to look to factor out possible errors... and also, in that case, i dunno if any kinda magic wand should/could be used or not, if u r familiar with anything good, then u may go ahead, but external tools and luajit builtin inspectors are hard to be used, and i would start to dig into the sources, probably either check out anything that isnt plain lua, and tracking down the called functions right from requiring turbo... its not a happy thing, but also turbo has a sane and digestible sauce :D otherwise im in the deepest depths of my todo list, and still didnt arrive to the point to utilize turbo, so im not really into it as of now, nor have enough free time... i cant even reach the end of my incoming mails in these more recent days :(

-- nothing important below --

i have 370 unread mail threads, but most are small, and mostly only the 1st will be interesting from those threads, while the lua related stuffs of get some priority, and those are the bigger ones, the rest is mostly from void-linux/void-packages to check out those that i dunno and i have special interest in some, but those can be mostly digested in bigger bunches... not to mention the details of a data loss from some months ago that gave me much headaches and that im currently about to hack on void instead of my lua related stuffs, just to build a new empire on the ashes of my previous system :D

other than my issues, i care about turbo, maybe i will try to reshape it one day according to my taste, but i probably wont start with anything else, so im around and i really hope i can be useful a bit, and that i will arrive to the point of utilizing turbo sooner than later.....

good luck, and have fun! :)

GINUGEORGE commented 4 years ago

@hippi777 If you could please elaborate on the below mentioned lines, then it would be very helpful

@kernelsauce @hippi777 I ran a different experiment this time for observing the increase in VmRSS between issuing requests with concurrency as 1 and greater than 1.

So below mentioned are the observations:

Experiment 1 : Issuing Parallel Requests

Initial Stage of Process Start:

~ # ps -ef|grep server.lua
sysadmin  7055     1  0 00:58 ttyS4    00:00:00 luajit server.lua
sysadmin  7078  4160  0 00:58 ttyS4    00:00:00 grep server.lua
~ # cat /proc/7055/status | grep -i vm*
VmPeak:    14632 kB
VmSize:    14632 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      9000 kB
VmRSS:      9000 kB
VmData:     8688 kB
VmStk:       132 kB
VmExe:       364 kB
VmLib:      4288 kB
VmPTE:        22 kB
VmSwap:        0 kB
voluntary_ctxt_switches:        11
nonvoluntary_ctxt_switches:     8

Now I am trying to trigger 2000 requests to the server with concurrency level as 50.

# ab -k -n 2000 -c 50 https://<<Machine-IP>>:<<Port>>/<<API URI>>

I am checking the VMRSS again as below

~ # cat /proc/7055/status | grep -i vm*
VmPeak:    16932 kB
VmSize:    16852 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:     11636 kB
VmRSS:     11528 kB
VmData:    10908 kB
VmStk:       132 kB
VmExe:       364 kB
VmLib:      4288 kB
VmPTE:        22 kB
VmSwap:        0 kB
voluntary_ctxt_switches:        2410
nonvoluntary_ctxt_switches:     86119

Result : VmRSS has increased by around 2528 KB

Experiment 2 : Issuing One-By-One Requests for the same URI

Initial Stage of Process Start:

~ # ps -ef|grep server.lua
sysadmin  8660     1  0 01:15 ttyS4    00:00:00 luajit server.lua
sysadmin  8686  4160  0 01:15 ttyS4    00:00:00 grep server.lua
~ # cat /proc/8660/status | grep -i vm*
VmPeak:    14632 kB
VmSize:    14632 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      9000 kB
VmRSS:      9000 kB
VmData:     8688 kB
VmStk:       132 kB
VmExe:       364 kB
VmLib:      4288 kB
VmPTE:        20 kB
VmSwap:        0 kB
voluntary_ctxt_switches:        5
nonvoluntary_ctxt_switches:     8

Now I am trying to trigger 2000 requests to the server with concurrency level as 1.

# ab -k -n 2000 -c 1 https://<<Machine-IP>>:<<Port>>/<<API URI>>

I am checking the VMRSS again as below

~ # cat /proc/8660/status | grep -i vm*
VmPeak:    14636 kB
VmSize:    14520 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      9176 kB
VmRSS:      9124 kB
VmData:     8576 kB
VmStk:       132 kB
VmExe:       364 kB
VmLib:      4288 kB
VmPTE:        20 kB
VmSwap:        0 kB
voluntary_ctxt_switches:        5709
nonvoluntary_ctxt_switches:     85796

Result : VmRSS has increased by around 124 KB only

Does anyone have an idea as to why the VmRSS increases by around 2528 KB when issuing Parallel Requests and increases only by around 124 KB when issuing Requests one at a time ?

Does it have anything to do with the Lua Garbage Collector ? Is it not able to do proper garbage collection when parallel requests are issued.

Any suggestions/ideas on how to fix this would be welcome.

hippi777 commented 4 years ago

hi again! :)

what i have said is simple as this:

app=nil
HelloNameHandler=nil
turbo=nil

if this, some garbage collection (more than one?) and jit flushing (even with jit initially disabled? :D ) will increase the mem usage for different tests, then probably its not about turbo...

what i can think about is mem fragmentation, and i think there are tools to observe that, but ive never played with such

the other thing that comes into my mind is that cdata is a strange beast (i dunno turbo enough to know if this can be suspected at all, but strong reliance on luajit instead of various lua engines suggests me that its about heavy reliance on the ffi, and that way cdata...) so it cant be deleted, and things like anon c structures will consume their "pool" (its an array if im right) and maybe this can be a mistake around turbo... in this case, running your test multiple times on the same engines with deleting all turbo related stuffs like in the code snippet above could maybe reveal things, but im not sure, it can even be misleading....

also, i would check the mem usage with collectgarbage'count' too, as thats what the engine can see, and maybe what u saw is something that the os does in the background with the memory :D btw i think its only about the gc'able things, so for example maybe cdata isnt counted there... dunno...

an another idea is to try raptorjit, luavela or any more heavily modified luajit forks to see if there is any difference, and that way maybe it could turn out that its about the engine instead of turbo...

maybe u could try to utilize anything else for the analysis to get a new perspective, or ask this on the lj mailing list, just to get some good hints if nothing more, but nice surprises may come as well :)

im slowly running out of my own ideas, while im somewhat sad to see u fighting this much bravely im not much able to elaborate more significantly, but still, im around, and i hope i can be somewhat useful :) but seriously, i think the luajit mailing list can give u the most help, and i think that reading the sources can be fast if u just skip the obviously harmless parts where no magic can happen...

stay strong, i wish all the bests to u! :)

hippi777 commented 3 years ago

hi again! :)

i just came across this tool while im into cleaning up some of my opened tabs, maybe its not for luajit, maybe it needs much effort to make it work with it, maybe it works out of the box, i have no idea, so im just leaving here a potentially useful pointer: https://github.com/pmusa/lmprof

bests! :)

kernelsauce commented 3 years ago

No, this is not expected. What version of Luajit are you running? I have had pretty long tests running to check for any leaks, and they always came back clean.

plmanik commented 3 years ago

luajit version is 2.0.5. We are seeing high cpu usage when getting request and cpu usage going down after serving request. But if we give continuous request this uses high cpu usage and other program will get affected due to one process taking more than 50-60%cpu, any way of reducing cpu usage and VMRSS in this case

plmanik commented 3 years ago

Hi, I tried further for finding memory usage by using the memory snapshot and details below Tried with lua-snapshot(https://github.com/cloudwu/lua-snapshot) Program to check memory usage

local turbo = require("turbo") local HelloNameHandler = class("HelloNameHandler", turbo.web.RequestHandler) local jit_compiler_disable_flag = false local snapshot = require "snapshot" local snapshot_utils = require "snapshot_utils" local construct_indentation = snapshot_utils.construct_indentation local print_r = require "print_r" local S1, S2 turbo.log.disable_all() function HelloNameHandler:get() -- Get the 'name' argument, or use 'Santa Claus' if it does not exist local name = self:get_argument("name", "Santa Claus") self:write("Hello " .. name .. "!") self:finish() end

function HelloNameHandler:post() -- Get the 'name' argument, or use 'Easter Bunny' if it does not exist local name = self:get_argument("name", "Easter Bunny") self:write("Hello " .. name .. "!") end

local app = turbo.web.Application:new({ {"/hello", HelloNameHandler} }) function HelloNameHandler:on_finish() S2 = snapshot()

local diff = {} for k,v in pairs(S2) do if not S1[k] then diff[k] = v end end

print_r(diff) print("===========================")

local result = construct_indentation(diff) print_r(result) end

app:listen(9080) S1 = snapshot() turbo.ioloop.instance():start()

After running above code and give one request, able to see memory remains after response available in Memory_details_luasnapshot.txt

Before giving request cat /proc/26575/status | grep -i vm* VmPeak: 30392 kB VmSize: 29880 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 5504 kB VmRSS: 5120 kB VmData: 1288 kB VmStk: 132 kB VmExe: 440 kB VmLib: 7288 kB VmPTE: 136 kB VmSwap: 0 kB NoNewPrivs: 0 Speculation_Store_Bypass: thread vulnerable voluntary_ctxt_switches: 8 nonvoluntary_ctxt_switches: 0

After getting response(one request from browser http://ipaddress:9080/hello) cat /proc/26575/status | grep -i vm* VmPeak: 30520 kB VmSize: 30136 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 5644 kB VmRSS: 5236 kB VmData: 1544 kB VmStk: 132 kB VmExe: 440 kB VmLib: 7288 kB VmPTE: 144 kB VmSwap: 0 kB NoNewPrivs: 0 Speculation_Store_Bypass: thread vulnerable voluntary_ctxt_switches: 164 nonvoluntary_ctxt_switches: 1

Tried with https://github.com/yaukeywang/LuaMemorySnapshotDump Program to check memory usage

local turbo = require("turbo") local mri = require("MemoryReferenceInfo") local HelloNameHandler = class("HelloNameHandler", turbo.web.RequestHandler) mri.m_cConfig.m_bAllMemoryRefFileAddTime = false function HelloNameHandler:get() -- Get the 'name' argument, or use 'Santa Claus' if it does not exist local name = self:get_argument("name", "Santa Claus") self:write("Hello " .. name .. "!") self:finish() end

function HelloNameHandler:post() -- Get the 'name' argument, or use 'Easter Bunny' if it does not exist local name = self:get_argument("name", "Easter Bunny") self:write("Hello " .. name .. "!") end

local app = turbo.web.Application:new({ {"/hello", HelloNameHandler} }) function HelloNameHandler:on_finish()

mri.m_cMethods.DumpMemorySnapshot("./", "2-After", -1)
mri.m_cMethods.DumpMemorySnapshotComparedFile("./", "Compared", -1,

"./LuaMemRefInfo-All-[1-Before].txt", "./LuaMemRefInfo-All-[2-After].txt") end

app:listen(9080) mri.m_cMethods.DumpMemorySnapshot("./", "1-Before", -1) turbo.ioloop.instance():start()

Before giving request cat /proc/26797/status | grep -i vm* VmPeak: 28584 kB VmSize: 28584 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 5492 kB VmRSS: 5492 kB VmData: 2052 kB VmStk: 132 kB VmExe: 440 kB VmLib: 7276 kB VmPTE: 160 kB VmSwap: 0 kB NoNewPrivs: 0 Speculation_Store_Bypass: thread vulnerable voluntary_ctxt_switches: 9 nonvoluntary_ctxt_switches: 0

After getting response(one request from browser http://ipaddress:9080/hello) cat /proc/26797/status | grep -i vm* VmPeak: 28968 kB VmSize: 28968 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 6016 kB VmRSS: 6016 kB VmData: 2436 kB VmStk: 132 kB VmExe: 440 kB VmLib: 7276 kB VmPTE: 164 kB VmSwap: 0 kB NoNewPrivs: 0 Speculation_Store_Bypass: thread vulnerable voluntary_ctxt_switches: 55 nonvoluntary_ctxt_switches: 0

After running above code and give one request, able to see memory remains after response available in LuaMemRefInfo-All-[20201003-011657]-[Compared].txt

Seems header, cookie, response buffer was not kept and tried clearing on finish, but still showing same. Reference count is increasing after giving request continuously. Used collect garbage also and memory not reclaimed

Memory_details_luasnapshot.txt LuaMemRefInfo-All-[20201003-011657]-[Compared].txt

Thanks