luvit / luvit

Lua + libUV + jIT = pure awesomesauce
https://luvit.io/
Apache License 2.0
3.82k stars 376 forks source link

net module high cpu usage under stress #1215

Open trab0cch3tt0 opened 1 month ago

trab0cch3tt0 commented 1 month ago

I have a completely normal net server on Linux, it's fine until you get flooded with small messages and have a callback for data. For reference the other side just streams 'y' constantly to the file descriptor, and the servers usage goes to 100% for any action done inside the callback. I don't even know if this can be fixed as it might be luajit literally not being able to handle that many operations, but there should be some work around even if its not directly/instantly receiving the message.

Bilal2453 commented 1 month ago

Yes DoS-ing yourself will result in a high CPU usage, especially if this test was done locally, I don't know what else to expect here. I know that the Luvit default net server is somewhat inefficient which probably doesn't help with it. Try the coro-net server for a reference.

But besides, you will always, on any platform, hit this eventually and the only difference will be how much each platforms handles before reaching the bottleneck. If you want a meaningful number you need to see how many stream of Ys it handles as it reaches 100% usage and compare that with something else like nodejs, Luvit should usually win as it has less overhead (iirc coro net comes first, then nodejs, then Luvit/net).

Bilal2453 commented 1 month ago

There is no way around it if you are exposing this directly to the internet, you could assign a single core to Luvit, or even use the sandboxing features of Linux to limit allowed CPU time, but that will still overload your server when a DDoS happens, just not the whole machine, but it will also be able to handle less connections. The real solution here is to not expose this directly to the internet and instead have a middleware such as CloudFlare to protect your service.

It will also be useful if you gave a code example in case someone wanted to benchmark this properly.

trab0cch3tt0 commented 1 month ago

local net = require'net'
local server = net.createServer(function(c)
    print("got connection")
    c:on("error", function(err)
        print("error:", err)
        c:destroy()
    end)
    c:on("data", function(data)
        local d = ""
        d = d .. data
        d = ''
    end)
end)
server:on('error', function(err)
    if err then print("server error:", err) end
end)

server:listen(8888, "127.0.0.1")```
This is a minimal example, it gets about 60% usage using something like yes | nc 127.0.0.1 8888, in larger code bases this becomes using the entire cpu, idk why but other languages like nodejs can deal with it better
Bilal2453 commented 1 month ago

idk why but other languages like nodejs can deal with it better

Because it doesn't, I don't know where you got that from. Here running your example in both Luvit and Nodejs.

Luvit: image

local net = require'net'
local server = net.createServer(function(c)
    print("got connection")
    c:on("error", function(err)
        print("error:", err)
        c:destroy()
    end)
    c:on("data", function(data)
        local d = ""
        d = d .. data
        d = ''
    end)
end)
server:on('error', function(err)
    if err then print("server error:", err) end
end)

server:listen(8888, "127.0.0.1")

Nodejs: image

const net = require("net")

const server = net.createServer(c => {
    console.log("got connection")
    c.on("error", err => {
        console.error(err)
        c:destroy()
    })
    c.on("data", data => {
        let d = ""
        d = d + data
        d = ""
    })
})

server.on("error", err => {
  console.error(err);
})

server.listen(8080, "127.0.0.1")

Again, those numbers are meaningless, you are DoSing yourself, and getting DoSed, the CPU will obviously be stressed trying to manage as much connections as it can accept...

If you want meaningful numbers use something like wrk which will tell you how many connections the CPU is really handling. Here is a fun benchmark:

nodejs http (keep-alive on):
Requests/sec:  10708.38
Transfer/sec:      1.64MB

luvit http (keep-alive on):
Requests/sec:  13926.38
Transfer/sec:      2.14MB

coro-http (keep-alive on):
Requests/sec:    104.72
Transfer/sec:     12.68KB

nodejs http (keep-alive off):
Requests/sec:   5134.01
Transfer/sec:    666.82KB

luvit http (keep-alive off):
Requests/sec:   8735.01
Transfer/sec:      1.11MB

coro-http (keep-alive off):
Requests/sec:  15382.50
Transfer/sec:      1.41MB

(For context, this is HTTP not raw TCP, ie it uses net as its implementation for TCP) (For more context, coro-http used to handle keep alive very poorly because it wasn't implemented, that changed now)

Here is another attempt at benchmarking http servers https://github.com/luvit/luvit/issues/1197