pawelgaczynski / gain

Gain is a high-performance io_uring networking framework written entirely in Go.
Apache License 2.0
509 stars 22 forks source link

add read() request error: error getting SQE: SQE overflow #17

Open mammenj opened 1 year ago

mammenj commented 1 year ago

I am getting the following error when the number of clients go over 10k. I understand this is alpha version, so if this is a known issue you may close this issue with approriate response. Thanks for your work.

{"level":"error","component":"consumer","worker index":3,"ring fd":11,"error":"add read() request error: error getting SQE: SQE overflow: 1025","time":1685341320,"message":"add request error"}

You may access the test code at https://github.com/mammenj/ioring_test I am not an expert at golang but I am interested in the performance angle of io_ring compared to others, hence this this test. Versions used:

go 1.20 require ( github.com/pawelgaczynski/gain v0.3.1-alpha github.com/rs/zerolog v1.29.1 )

log.txt

pawelgaczynski commented 1 year ago

@mammenj could you check if the fix will work for you: https://github.com/pawelgaczynski/gain/tree/17-add-read-request-error-error-getting-sqe-sqe-overflow ?

mammenj commented 1 year ago

Thanks @pawelgaczynski for the update, which branch should I pull to test this? I tried to upgrade but its still on v0.3.1-alpha

pawelgaczynski commented 1 year ago

@mammenj this branch: https://github.com/pawelgaczynski/gain/tree/17-add-read-request-error-error-getting-sqe-sqe-overflow Clone gain repo to your machine. Checkout to branch "17-add-read-request-error-error-getting-sqe-sqe-overflow". And use 'replace' directive in your go.mod file: https://thewebivore.com/using-replace-in-go-mod-to-point-to-your-local-module/

mammenj commented 1 year ago

Hi @pawelgaczynski I have run the test for 10k and it works fine. So I pushed it a little further with 30k, it worked fine but beyond 30k it gives the below error, It may have hit the memory limit from the below errors. You could assess if this is a bug or a limitations of the system. Again thanks for looking into it. ` "level":"info","active connections":16354,"remote address":"127.0.0.1:46658","message":"New connection accepted"} {"level":"info","active connections":16355,"remote address":"127.0.0.1:46696","message":"New connection accepted"} 2023/06/10 19:03:55 second internal mmap failed: internalMMap error: cannot allocate memory {"level":"info","bytes":11,"remote address":"127.0.0.1:51514","message":"Bytes received from remote peer"} panic: second internal mmap failed: internalMMap error: cannot allocate memory

goroutine 34 [running]: log.Panic({0xc000053b78?, 0x1f?, 0xc000053b68?}) /usr/local/go/src/log/log.go:384 +0x65 github.com/pawelgaczynski/gain/pkg/pool/virtualmem.allocateBuffer(0x1000) /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem.go:96 +0x1fa github.com/pawelgaczynski/gain/pkg/pool/virtualmem.NewVirtualMem(0x1000) /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem.go:50 +0x25 github.com/pawelgaczynski/gain/pkg/pool/virtualmem.(Pool).Get(0xc000053c30?, 0x519f89?) /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem_pool.go:53 +0xd9 github.com/pawelgaczynski/gain/pkg/pool/virtualmem.Get(...) /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem_pool.go:30 github.com/pawelgaczynski/gain/pkg/buffer/magicring.NewMagicBuffer(0xc000053c58?) /home/mammen/Documents/workspace/go-wkspace/gain/pkg/buffer/magicring/ringbuffer.go:400 +0x87 github.com/pawelgaczynski/gain/pkg/pool/ringbuffer.(Pool).Get(0x68c5c0) /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/ringbuffer/ringbuffer.go:76 +0x50 github.com/pawelgaczynski/gain/pkg/pool/ringbuffer.Get(...) /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/ringbuffer/ringbuffer.go:64 github.com/pawelgaczynski/gain.newConnection() /home/mammen/Documents/workspace/go-wkspace/gain/conn.go:361 +0x25 github.com/pawelgaczynski/gain.(connectionPool).Get(0x544a60?) /home/mammen/Documents/workspace/go-wkspace/gain/conn_pool.go:63 +0xae github.com/pawelgaczynski/gain.getConnection(...) /home/mammen/Documents/workspace/go-wkspace/gain/conn_pool.go:25 github.com/pawelgaczynski/gain.(connectionManager).get(0xc0000d20a0, 0x2978, 0x2978) /home/mammen/Documents/workspace/go-wkspace/gain/conn_manager.go:48 +0x53 github.com/pawelgaczynski/gain.(connectionManager).getFd(...) /home/mammen/Documents/workspace/go-wkspace/gain/conn_manager.go:39 github.com/pawelgaczynski/gain.(consumerWorker).handleNewConn(0xc0000aa360, 0x2978) /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:126 +0x39 github.com/pawelgaczynski/gain.(consumerWorker).getConnsFromQueue(0xc0000aa360) /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:151 +0x59 github.com/pawelgaczynski/gain.(consumerWorker).handleJobsInQueues(0xc0000aa360) /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:160 +0x25 github.com/pawelgaczynski/gain.(looper).innerLoop(0xc0000aa300, 0xc000053e70) /home/mammen/Documents/workspace/go-wkspace/gain/looper.go:81 +0x150 github.com/pawelgaczynski/gain.(looper).startLoop(0xc0000aa300, 0x0?, 0x0?) /home/mammen/Documents/workspace/go-wkspace/gain/looper.go:121 +0x128 github.com/pawelgaczynski/gain.(consumerWorker).loop(0xc0000aa360, 0x0?) /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:192 +0x1fb github.com/pawelgaczynski/gain.(engine).startConsumers.func1.1(0xc0000aa360) /home/mammen/Documents/workspace/go-wkspace/gain/server.go:95 +0x74 created by github.com/pawelgaczynski/gain.(*engine).startConsumers.func1 /home/mammen/Documents/workspace/go-wkspace/gain/server.go:94 +0x18a exit status 2 `

pawelgaczynski commented 1 year ago

Hi @mammenj. This is something that is missing in the documentation. Gain uses mmap to allocate data buffers. Linux has a limit on such allocations per process. In my Ubuntu 22.04 the limit is 65530. You can check it with command:

cat /proc/sys/vm/max_map_count

You can increase that limit with the command (root privileges required):

sysctl -w vm.max_map_count=262144

In addition, panic is not the best way to handle the exceeding of such a limit. I will correct this soon. Thank you for the tests! I appreciate it very much.

mammenj commented 1 year ago

I am on

Distributor ID: Zorin
Description:    Zorin OS 16.2
Release:    16
Codename:   focal

And the max_map_count :

$ cat /proc/sys/vm/max_map_count
65530 

With the tests done above, I havent reached the max. It panics around 40k. Hope this helps in debugging. Let me know if you need more info.

mammenj commented 1 year ago

Below log is for: numberOfClients = 40000


{"level":"info","active connections":16350,"remote address":"127.0.0.1:47290","message":"New connection accepted"}
{"level":"info","active connections":16351,"remote address":"127.0.0.1:47386","message":"New connection accepted"}
{"level":"info","active connections":16352,"remote address":"127.0.0.1:47448","message":"New connection accepted"}
{"level":"info","active connections":16353,"remote address":"127.0.0.1:47512","message":"New connection accepted"}
{"level":"info","active connections":16354,"remote address":"127.0.0.1:47556","message":"New connection accepted"}
{"level":"info","active connections":16355,"remote address":"127.0.0.1:47626","message":"New connection accepted"}
2023/06/13 09:51:49 second internal mmap failed: internalMMap error: cannot allocate memory
panic: second internal mmap failed: internalMMap error: cannot allocate memory

goroutine 35 [running]:
log.Panic({0xc0001b3b78?, 0x1f?, 0xc0001b3b68?})
        /usr/local/go/src/log/log.go:384 +0x65
github.com/pawelgaczynski/gain/pkg/pool/virtualmem.allocateBuffer(0x1000)
        /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem.go:96 +0x1fa
github.com/pawelgaczynski/gain/pkg/pool/virtualmem.NewVirtualMem(0x1000)
        /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem.go:50 +0x25
github.com/pawelgaczynski/gain/pkg/pool/virtualmem.(*Pool).Get(0xc0001b3c30?, 0x519f89?)
        /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem_pool.go:53 +0xd9
github.com/pawelgaczynski/gain/pkg/pool/virtualmem.Get(...)
        /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/virtualmem/virtualmem_pool.go:30
github.com/pawelgaczynski/gain/pkg/buffer/magicring.NewMagicBuffer(0xc0001b3c58?)
        /home/mammen/Documents/workspace/go-wkspace/gain/pkg/buffer/magicring/ringbuffer.go:400 +0x87
github.com/pawelgaczynski/gain/pkg/pool/ringbuffer.(*Pool).Get(0x68c640)
        /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/ringbuffer/ringbuffer.go:76 +0x50
github.com/pawelgaczynski/gain/pkg/pool/ringbuffer.Get(...)
        /home/mammen/Documents/workspace/go-wkspace/gain/pkg/pool/ringbuffer/ringbuffer.go:64
github.com/pawelgaczynski/gain.newConnection()
        /home/mammen/Documents/workspace/go-wkspace/gain/conn.go:362 +0x36
github.com/pawelgaczynski/gain.(*connectionPool).Get(0x544a60?)
        /home/mammen/Documents/workspace/go-wkspace/gain/conn_pool.go:63 +0xae
github.com/pawelgaczynski/gain.getConnection(...)
        /home/mammen/Documents/workspace/go-wkspace/gain/conn_pool.go:25
github.com/pawelgaczynski/gain.(*connectionManager).get(0xc00002e120, 0x495, 0x495)
        /home/mammen/Documents/workspace/go-wkspace/gain/conn_manager.go:48 +0x53
github.com/pawelgaczynski/gain.(*connectionManager).getFd(...)
        /home/mammen/Documents/workspace/go-wkspace/gain/conn_manager.go:39
github.com/pawelgaczynski/gain.(*consumerWorker).handleNewConn(0xc0000743c0, 0x495)
        /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:126 +0x39
github.com/pawelgaczynski/gain.(*consumerWorker).getConnsFromQueue(0xc0000743c0)
        /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:151 +0x59
github.com/pawelgaczynski/gain.(*consumerWorker).handleJobsInQueues(0xc0000743c0)
        /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:160 +0x25
github.com/pawelgaczynski/gain.(*looper).innerLoop(0xc000074360, 0xc0001b3e70)
        /home/mammen/Documents/workspace/go-wkspace/gain/looper.go:81 +0x150
github.com/pawelgaczynski/gain.(*looper).startLoop(0xc000074360, 0x0?, 0x0?)
        /home/mammen/Documents/workspace/go-wkspace/gain/looper.go:121 +0x128
github.com/pawelgaczynski/gain.(*consumerWorker).loop(0xc0000743c0, 0x0?)
        /home/mammen/Documents/workspace/go-wkspace/gain/consumer_worker.go:192 +0x1fb
github.com/pawelgaczynski/gain.(*engine).startConsumers.func1.1(0xc0000743c0)
        /home/mammen/Documents/workspace/go-wkspace/gain/server.go:95 +0x74
created by github.com/pawelgaczynski/gain.(*engine).startConsumers.func1
        /home/mammen/Documents/workspace/go-wkspace/gain/server.go:94 +0x18a
mammen@mammen-IdeaPad-3-15IIL05:~/Documents/workspace/go-wkspace/ioring$ B```
pawelgaczynski commented 1 year ago

@mammenj did you try to increase the 'max_map_count' limit?

mammenj commented 1 year ago

Sorry No, I did not. I can, let me know what should I increase it to run 40k clients.

pawelgaczynski commented 1 year ago

@mammenj try: sysctl -w vm.max_map_count=262144

mammenj commented 1 year ago

I have increased the vm to 262144 and re ran the test. The server did not crash but some errors were logged, snippet below

{"level":"info","remote address":"127.0.0.1:48352","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48336","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48284","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48262","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48168","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48132","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48092","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48046","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:48016","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47974","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47940","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47852","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47766","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47716","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47680","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47636","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47592","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47538","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47486","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47448","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47368","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47116","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47310","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47144","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47276","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:47230","error":"EOF","message":"Connection from remote peer closed"}
{"level":"info","remote address":"127.0.0.1:53097","error":"EOF","message":"Connection from remote peer closed"}

But the client crashed, log below

panic: dial tcp 127.0.0.1:8080: connect: cannot assign requested address

goroutine 34927 [running]:
log.Panic({0xc003e0cf90?, 0xc?, 0xc000cf8770?})
    /usr/local/go/src/log/log.go:384 +0x65
main.main.func1(0x0?)
    /home/mammen/Documents/workspace/go-wkspace/ioring/test/clients.go:41 +0xdd
created by main.main
    /home/mammen/Documents/workspace/go-wkspace/ioring/test/clients.go:28 +0x5d
exit status 2

Line 28 is where I call the go routine:

go func(j int) {

Line 41 is where the panic is called. The test source code is linked in the first post.

pawelgaczynski commented 1 year ago

"connection from remote peer closed" means that the client has closed the connection. The severity of this logger message is misleading. I will change it rather. How many clients did you run in the test?

mammenj commented 1 year ago

The last test was 40k clients.