ohler55 / agoo

A High Performance HTTP Server for Ruby
MIT License
905 stars 37 forks source link

How to use multiple threads correctly #132

Open GA9BR1 opened 1 month ago

GA9BR1 commented 1 month ago

I was looking at the docs and I found that Server.init has a parameter thread_count, I was using it default and then I tried to go multi-threaded. I started the server with 4, 3 and 2 for thread_count but when I do a request to the server it just doesn't respond me. I was thinking that my code is the problem but I remove everything to test and just leaved a simple handler and it still with the same behavior(when it was single threaded it was working fine). I don't know if I'm missing something, if it's needed to do something extra, or if it's just a bug. An important detail is that in this log I tried to send 4 requests to the server and in the fourth one the server crashes. If I do with 3 threads it will be 3 requests and if it's with 2 threads it will be 2.

api01       | I 2024/07/18 03:17:30.652318222 connect: Server with pid 1 accepted connection 1 on http://:6464 [16] from 172.20.0.1
api01       | I 2024/07/18 03:17:30.652337162 connect: Server with pid 1 accepted connection 2 on http://:6464 [17] from 172.20.0.1
api01       | I 2024/07/18 03:17:30.659341561 connect: Connection 1 closed.
api01       | I 2024/07/18 03:17:30.659358880 request: HTTP 2: GET /clientes/1/extrato HTTP/1.1
api01       | User-Agent: PostmanRuntime/7.37.3
api01       | Accept: */*
api01       | Cache-Control: no-cache
api01       | Postman-Token: 4bb3cd48-0daf-4fbc-9b53-79e129f17eab
api01       | Host: localhost:6464
api01       | Accept-Encoding: gzip, deflate, br
api01       | Connection: keep-alive
api01       | D 2024/07/18 03:17:30.659367884 DEBUG: HTTP request on 2:
api01       | I 2024/07/18 03:17:35.110406471 connect: Server with pid 1 accepted connection 3 on http://:6464 [16] from 172.20.0.1
api01       | I 2024/07/18 03:17:35.110623832 connect: Server with pid 1 accepted connection 4 on http://:6464 [18] from 172.20.0.1
api01       | I 2024/07/18 03:17:35.113683953 connect: Connection 3 closed.
api01       | I 2024/07/18 03:17:35.113698046 request: HTTP 4: GET /clientes/1/extrato HTTP/1.1
api01       | User-Agent: PostmanRuntime/7.37.3
api01       | Accept: */*
api01       | Cache-Control: no-cache
api01       | Postman-Token: 19007aad-7782-4e24-8d2b-069f5607ca5a
api01       | Host: localhost:6464
api01       | Accept-Encoding: gzip, deflate, br
api01       | Connection: keep-alive
api01       | D 2024/07/18 03:17:35.113704136 DEBUG: HTTP request on 4:
api01       | I 2024/07/18 03:17:37.172697808 connect: Server with pid 1 accepted connection 5 on http://:6464 [16] from 172.20.0.1
api01       | I 2024/07/18 03:17:37.172868148 connect: Server with pid 1 accepted connection 6 on http://:6464 [19] from 172.20.0.1
api01       | I 2024/07/18 03:17:37.176406498 connect: Connection 5 closed.
api01       | I 2024/07/18 03:17:37.176416158 request: HTTP 6: GET /clientes/1/extrato HTTP/1.1
api01       | User-Agent: PostmanRuntime/7.37.3
api01       | Accept: */*
api01       | Cache-Control: no-cache
api01       | Postman-Token: f6b74152-e1a2-4ef0-871b-64cb5d56d434
api01       | Host: localhost:6464
api01       | Accept-Encoding: gzip, deflate, br
api01       | Connection: keep-alive
api01       | D 2024/07/18 03:17:37.176421709 DEBUG: HTTP request on 6:
api01       | I 2024/07/18 03:17:38.512030849 connect: Server with pid 1 accepted connection 7 on http://:6464 [16] from 172.20.0.1
api01       | I 2024/07/18 03:17:38.512284988 connect: Server with pid 1 accepted connection 8 on http://:6464 [20] from 172.20.0.1
api01       | I 2024/07/18 03:17:38.518436733 connect: Connection 7 closed.
api01       | I 2024/07/18 03:17:38.518450545 request: HTTP 8: GET /clientes/1/extrato HTTP/1.1
api01       | User-Agent: PostmanRuntime/7.37.3
api01       | Accept: */*
api01       | Cache-Control: no-cache
api01       | Postman-Token: ad23057a-20e4-490b-a825-b02cf9abd69a
api01       | Host: localhost:6464
api01       | Accept-Encoding: gzip, deflate, br
api01       | Connection: keep-alive
api01       | D 2024/07/18 03:17:38.518457229 DEBUG: HTTP request on 8:
pg_bouncer  | 2024-07-18 03:17:38.519 UTC [1] LOG C-0x5c38564cac10: postgres/app1@172.20.0.4:48452 closing because: client close request (age=19s)
api01 exited with code 0
#server.rb
require 'agoo'
require 'json'
require 'pg'

Agoo::Log.configure(dir: '', console: true, states: { INFO: true, DEBUG: true, connect: true, ERROR: true, request: true, response: true, eval: true })
Agoo::Server.init(ENV['PORT'].to_i, './root', thread_count: 4)

#the missing part of the code was omitted
...

payments_handler = Payments.new

Agoo::Server.handle(:GET, "/clientes/*/extrato", payments_handler)
Agoo::Server.handle(:POST, "/clientes/*/transacoes", payments_handler)
Agoo::Server.start()

I startup the server with ruby server.rb Ruby 3.3.3 Ubuntu 22.04.4 LTS

ohler55 commented 1 month ago

It has worked in the past. Give me a few days to take a look at the configuration. On the road right now.

ohler55 commented 1 month ago

When using multiple threads the Agoo::Server.start() returns. Previously I thought the app just exited after that. That does not seem to be the case now. The intent was to allow the application to do other actions after starting but since your app doesn't seem to need do anything other than wait, a simple sleep loop like the following should be enough:

while true
  sleep(1)
end

I think what you were seeing was the main thread exiting and then each thread would block until the last which crashed. I'll look at detecting the main thread exiting and then killing the children. I am a bit surprised that is needed but it certainly seems like it is now.

ohler55 commented 1 month ago

Latest release now exits if no wait loop is present. An example was also added.

GA9BR1 commented 1 month ago

Awesome thank you :D

GA9BR1 commented 1 month ago

I tried to run one load test in my app, and i got this error now:

api01       | double free or corruption (top)
api01       | /app/payment.rb:75: [BUG] Segmentation fault at 0x0000000000000000
api01       | ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [x86_64-linux]
api01       |
api01       | -- Control frame information -----------------------------------------------
api01       | c:0006 p:---- s:0030 e:000029 CFUNC  :exec_params
api01       | c:0005 p:0018 s:0024 e:000023 METHOD /app/payment.rb:75
api01       | c:0004 p:0016 s:0019 e:000018 METHOD /app/payment.rb:18
api01       | c:0003 p:0014 s:0015 e:000014 METHOD server.rb:32
api01       | c:0002 p:0033 s:0008 e:000007 METHOD server.rb:22 [FINISH]
api01       | c:0001 p:---- s:0003 e:000002 DUMMY  [FINISH]
api01       |
api01       | -- Ruby level backtrace information ----------------------------------------
api01       | server.rb:22:in `call'
api01       | server.rb:32:in `handle_payment'
api01       | /app/payment.rb:18:in `call'
api01       | /app/payment.rb:75:in `debit_amount'
api01       | /app/payment.rb:75:in `exec_params'
api01       |
api01       | -- Threading information ---------------------------------------------------
api01       | Total ractor count: 1
api01       | Ruby thread count for this ractor: 5
api01       |
api01       | -- Machine register context ------------------------------------------------
api01       |  RIP: 0x00007cd2229b950f RBP: 0x00007cd204dcc6c0 RSP: 0x00007cd204dcb040
api01       |  RAX: 0x0000000000000000 RBX: 0x00007cd222b66e70 RCX: 0x00007cd222a1de2c
api01       |  RDX: 0x0000000000000000 RDI: 0x0000000000000001 RSI: 0x0000000000000011
api01       |   R8: 0x0000000000000000  R9: 0x0000000000000000 R10: 0x0000000000000008
api01       |  R11: 0x0000000000000246 R12: 0x00007cd204dcb120 R13: 0x00007cd222d31000
api01       |  R14: 0x0000000000000001 R15: 0x0000000000000001 EFL: 0x0000000000010246
api01       |
api01       | -- C level backtrace information -------------------------------------------
ohler55 commented 1 month ago

Can you give me a test to help narrow it down?

GA9BR1 commented 1 month ago

Sure. In this repository you will have the gattling test that I'm using: https://github.com/zanfranceschi/rinha-de-backend-2024-q1. And in this one it's mine app https://github.com/GA9BR1/rinha-de-backend-2024-q1 the only differences between this one and my local one are that the app in the repository is not using multi-thread, the agoo version is different on the gemfile and the limitations on the docker-compose that i have removed in my local to test if the problem could be it.

ohler55 commented 1 month ago

That will take a bit to get set up. If you are willing go into the Makefile and define MEM_DEBUG. That will turn on memory debugging. It should catch the double free and tell us where it is happening.

GA9BR1 commented 1 month ago

Where can i found this Makefile?

ohler55 commented 1 month ago

Sounds like you might need a few directions so I'll describe how to try out a new branch and also update the Makefile.