Open otrebmuh opened 7 years ago
Hi Humberto
I guess it can be a problem with too many open connections to the database. Could you provide your web application log? Don't you see any ActiveRecord::ConnectionTimeoutError errors?
By the way errors like:
[pid 25740] open("/var/www/BeaconControl/lib/active_record/statement_cache.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
are connected with the way Rails loads constants.
Best, Jan
Hello Jan,
Thank you very much for your response. It seems indeed I am getting some ActiveRecord::ConnectionTimeoutErrors. Do you have a suggestion on how to address this problem?
Also, I would like to ask you, is this problem the result of having multiple instances of BeaconControl connected to the same DB? If the answer is yes, then does this mean that such a configuration is not supported by BeaconControl?
Sorry for asking so many questions and thank you very much in advance for your answers.
Best regards,
Humberto
Hello,
I am trying to setup BeaconControl in a load-balanced setting. To do so, I "dockerized" the application in three different containers: one for the app, one for sidekiq and one for redis. I deploy sets of these three containers in one virtual machine on the cloud. I have changed the server from WebRick to thin. The database is postgres and it is deployed in another docker container in another virtual machine. I have also put nginx as the proxy / load balancer.
I have done initial tests with 4 VMs in this configuration with good results (1200 events / minute requests to the API). However, I have a problem. After some time, the thin servers in the virtual machine seem to freeze and stop responding. After a while, they unfreeze but at this point they can freeze again very easily.
I have ssh'd into one of the virtual machines where thin had frozen but was working again and used strace -s 99 -ffp (pid) to see the error trace in case it would freeze again, which happened.
What I see is the following:
`[pid 25740] ppoll([{fd=12, events=POLLIN|POLLPRI}], 1, {0, 999940000}, NULL, 8) = 0 (Timeout) [pid 25740] clock_gettime(CLOCK_MONOTONIC_RAW, {117303, 100765800}) = 0 [pid 25740] clock_gettime(CLOCK_MONOTONIC_RAW, {117303, 100850300}) = 0 [pid 25740] clock_gettime(CLOCK_MONOTONIC_RAW, {117303, 100927600}) = 0 [pid 25740] clock_gettime(CLOCK_MONOTONIC, {117303, 107380552}) = 0 [pid 25740] ppoll([{fd=12, events=POLLIN|POLLPRI}], 1, {0, 999923000}, NULL, 8) = 1 ([{fd=12, revents=POLLIN}], left {0, 166821281}) [pid 25740] epoll_wait(12, {{EPOLLIN, {u32=109458896, u64=94502274872784}}}, 4096, 0) = 1 [pid 25740] accept4(13, {sa_family=AF_INET, sin_port=htons(40552), sin_addr=inet_addr("40.112.151.221")}, [16], SOCK_CLOEXEC) = 14 [pid 25740] fcntl(14, F_GETFD) = 0x1 (flags FD_CLOEXEC) [pid 25740] fcntl(14, F_SETFD, FD_CLOEXEC) = 0 [pid 25740] fcntl(14, F_GETFL) = 0x2 (flags O_RDWR) [pid 25740] fcntl(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 25740] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 25740] accept4(13, 0x7ffe872f5280, [16], SOCK_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) [pid 25740] accept(13, 0x7ffe872f5280, [16]) = -1 EAGAIN (Resource temporarily unavailable) [pid 25740] clock_gettime(CLOCK_MONOTONIC_RAW, {117303, 935105800}) = 0 [pid 25740] epoll_ctl(12, EPOLL_CTL_ADD, 14, {EPOLLIN, {u32=96991248, u64=94502262405136}}) = 0 [pid 25740] clock_gettime(CLOCK_MONOTONIC_RAW, {117303, 935203000}) = 0 [pid 25740] clock_gettime(CLOCK_MONOTONIC, {117303, 941658869}) = 0 [pid 25740] ppoll([{fd=12, events=POLLIN|POLLPRI}], 1, {0, 165647000}, NULL, 8) = 1 ([{fd=12, revents=POLLIN}], left {0, 165644900}) [pid 25740] epoll_wait(12, {{EPOLLIN, {u32=96991248, u64=94502262405136}}}, 4096, 0) = 1 [pid 25740] read(14, "GET / HTTP/1.0\r\nHost: backend\r\nConnection: close\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla"..., 16384) = 1395 [pid 25740] getpeername(14, {sa_family=AF_INET, sin_port=htons(40552), sin_addr=inet_addr("xx.xx.xx.xx")}, [16]) = 0 [pid 25740] stat("/var/www/BeaconControl/public/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 [pid 25740] stat("/var/www/BeaconControl/public/.html", 0x7ffe872ef370) = -1 ENOENT (No such file or directory) [pid 25740] stat("/var/www/BeaconControl/public/index.html", 0x7ffe872ef370) = -1 ENOENT (No such file or directory) [pid 25740] clock_gettime(CLOCK_MONOTONIC, {117303, 942849266}) = 0 [pid 25740] clock_gettime(CLOCK_REALTIME, {1478135093, 863144723}) = 0 [pid 25740] clock_gettime(CLOCK_REALTIME, {1478135093, 863271823}) = 0 [pid 25740] open("/var/www/BeaconControl/lib/active_record/statement_cache.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 25740] open("/var/www/BeaconControl/vendor/active_record/statement_cache.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 25740] open("/var/www/BeaconControl/app/forms/active_record/statement_cache.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 25740] open("/var/www/BeaconControl/app/validators/active_record/statement_cache.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [pid 25740] open("/var/www/BeaconControl/app/uploaders/active_record/statement_cache.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)