Open bbashy opened 5 months ago
Hey @bbashy, how many messages are you sending across the connections and what sort of server are you running?
Also, are you running Reverb alongside another an application or standalone?
Hi.
Using it as a presence channel for online user list. Hit 180 users at certain times. It is used on most pages across the site, so people will be leaving/joining on page change. People visit a few pages every few seconds.
Using a Hetzner cloud server AMD EPYC (8c 16GB).
No other applications, this server is just for this site. Was using Soketi (no CPU issues) but it doesn't work on node v20, which is why I switched to Reverb after Laravel upgrade.
Echo.join('online-users')
.here((users) => {
this.onlineUsers = users;
})
.joining((user) => {
this.onlineUsers.push(user);
})
.leaving((user) => {
this.onlineUsers = this.onlineUsers.filter(u => u.id !== user.id);
});
In channels.php it's just a return of user data. Username, country, link to profile.
It's been fine since I deployed last - 24/03/2024 22:00 UTC.
Thanks for the reply. Going to close this for now, but please mention me in this thread if you see the issue again and I will reopen.
@joedixon Started happening again. If there's any debugging I can do, let me know.
@bbashy Would be good to get a better picture of your environment and use case. Would you mind answering the following:
// Connections
app(\Illuminate\Broadcasting\BroadcastManager::class)
->getPusher()
->get('/connections');
// Channels
app(\Illuminate\Broadcasting\BroadcastManager::class)
->getPusher()
->get('/channels', ['info' => 'subscription_count']);
Psy Shell v0.12.2 (PHP 8.2.17 — cli) by Justin Hileman
> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/connections');
= {#5634
+"connections": 117,
}
> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/channels', ['info' => 'subscription_count']);
= {#5606
+"channels": {#5612
+"servers-booked": {#5610
+"subscription_count": 14,
},
+"stats-counter": {#5605
+"subscription_count": 14,
},
+"presence-public-chat": {#5607},
},
}
No need to install Pulse, just wanted to rule it out as a factor.
Would be interesting to run with debug mode enabled for a little while (which in itself will increase CPU) to see if anything untoward is happening when the CPU spikes.
Running with debug now and I'm running a script to gather the cpu % every 2 seconds.
Will report here with data.
Gone up to 100% again, here's the cpu usage, pretty much instant, not gradual. Running in debug for 24 hours.
2024-04-01 20:33:46 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:33:48 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:50 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:52 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:54 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:56 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:58 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:00 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:02 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:04 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:06 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:08 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:10 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:12 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:14 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:16 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:18 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:20 :: reverb:start --debug [2282066] CPU Usage: 52%
2024-04-01 20:34:22 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:24 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:26 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:28 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:30 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:32 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:34 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:36 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:38 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:40 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:42 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:44 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:46 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:48 :: reverb:start --debug [2282066] CPU Usage: 100%
The reverb logs don't show much apart from maybe this? This is quite regular in the whole log even when it was low usage.
Message Handled ........................................ 285448861.320304079
Connection Closed ...................................... 880659721.152382869
Pruning Stale Connections ..................................................
Pinging Inactive Connections ...............................................
Connection Pinged ...................................... 373144331.352179112
Connection Pinged ...................................... 770897394.824165909
Connection Pinged ...................................... 128326687.479625382
Connection Pinged ...................................... 238258499.280963052
Connection Pinged ...................................... 711357617.733355114
Connection Pinged ...................................... 913581555.461208734
Connection Pinged ...................................... 750335547.483737098
Connection Pinged ...................................... 824799629.748502904
Connection Pinged ...................................... 373757589.569113713
Connection Pinged ....................................... 88650420.659599424
Connection Pinged ...................................... 875588527.671792201
Connection Pinged ...................................... 645841583.993685622
Connection Pinged ....................................... 69443263.770122343
Connection Pinged ....................................... 969366386.39638130
Connection Pinged ....................................... 97541188.835314754
Connection Pinged ...................................... 336550205.253472927
Connection Pinged ...................................... 837108525.856830003
Connection Pinged ...................................... 843470688.187968972
Connection Pinged ...................................... 903186882.544248154
Connection Pinged ....................................... 70865328.772779885
then after 170~ lines of that, it does loads of;
Message Received ....................................... 913581555.461208734
1 {
2 "event": "pusher:ping",
3 "data": []
4 }
Message Handled ........................................ 913581555.461208734
Message Received ......................................... 746315846.6165177
1 {
2 "event": "pusher:ping",
3 "data": []
4 }
@bbashy the logs look normal, but I'm interested to see if the issue is related to the pinging of connections which Reverb defaults to running every 60 seconds. Can you try increasing the value of REVERB_APP_PING_INTERVAL
to see whether that makes any difference. Perhaps starting at 1 hour (3600) and increasing from there.
I've set REVERB_APP_PING_INTERVAL=3600
and I've restarted reverb. Back to 0-1% CPU and will run the monitoring for the process.
Must be something to do with the amount of pings along with the connections subscribe/unsubscribe.
I use it as an online list and it's included on most pages, so page change will be a disconnect/connect.
Right, but I would say that's pretty standard usage. Perhaps the difference is the online list as that will cause events to be broadcast across all connected clients on every connection / disconnection. Pusher actually limit this functionality to 100 connections only.
Yep, I saw that and Soketi limit it too. I increased the limit on Soketi to 500 and the cpu usage was still below 10% even after a month of uptime.
The online list is only on the homepage though, other pages only have a counter, but I guess it still updates to all clients.
Gone up again. Shall I try a higher ping interval? It's weird it goes straight up 0-100% in 4 seconds.
2024-04-04 19:28:00 :: reverb:start --debug [237336] CPU Usage: 2%
2024-04-04 19:28:02 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:04 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:06 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:08 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:10 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:12 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:14 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:16 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:18 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:20 :: reverb:start --debug [237336] CPU Usage: 87%
2024-04-04 19:28:22 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:24 :: reverb:start --debug [237336] CPU Usage: 99%
2024-04-04 19:28:26 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:28 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:30 :: reverb:start --debug [237336] CPU Usage: 100%
If it's not something others have had and we can't resolve, I'll look into changing the online user list to something ws message based.
It would be good to try increasing the ping interval to see if it has any impact on when the CPU spikes. Agree it's strange that it ramps up so quickly.
Are you in a position to be able to temporarily disable the online user list so we can categorically determine that is actually the issue?
@bbashy Sudden CPU spikes are definitely not to be expected and if you've encountered the exact same error in both the original Laravel Websockets package and now Reverb, this might in fact indicate a problem in the underlying Ratchet/ReactPHP components. Similar reports have been posted in the past and we have some ideas what could cause this, but we've never been able to reproduce this locally, see https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1098798274 / https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1101270289. If this matches your issue and you can reproduce this locally, please reach out and I'm happy to take a look at this.
@clue Looks very similar and relates to my use-case of high connect/disconnect events. In theory, all we'd need to do is replicate a lot of users (100+) connecting/disconnecting. It started looking like it was because of that after I saw the time it shot up. 19:30-20:30 is when my site has the most users on it, and they'll be doing a lot of page changes (disconnect/connect).
@bbashy I faced a similar problem. Supervisor is raised under the docker container, the reverb process is started there. This process was loaded at 100%, 500+ connections at the same time.
The solution turned out to be to add an extension as indicated in the documentation Event Cycle
Dockerfile
FROM php:8.3.3-fpm-bullseye
#....
#....
RUN apt-get install openssl
RUN docker-php-ext-install sockets
RUN pecl channel-update pecl.php.net
RUN apt-get update && apt-get install -y libevent-dev
RUN printf "\n\n\n\n\nno\nyes\n\n" | pecl install event
RUN docker-php-ext-enable event
Also installed outside the host system ulimit -n 32766
in the supervisor settings supervisor.conf minfds=10000
[supervisord]
nodaemon=true
chmod=0770
logfile=/var/log/supervisor/supervisord.log
pidfile=/var/run/supervisord.pid
minfds=10000
[program:laravel-websocket]
numprocs=8
process_name=%(program_name)s_%(process_num)02d
chmod=0770
command=php /var/www/html/artisan reverb:start
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/www/html/storage/logs/reverb.log
finally
Might have fixed an issue for you, but I believe mine was slightly different.
I have no problems when running it under Soketi on the same machine/ulimit.
finally
Возможно, я решил для вас проблему, но я думаю, что у меня было немного иначе.
I believe that our answers can help other developers to cope with such a problem. And the owners of the repository may take into account some of our practices
This is still on my radar, I just haven't had the chance to take a deep dive on it just yet.
Это все еще в моем поле зрения, просто у меня еще не было возможности глубоко погрузиться в него.
Yesterday I experimented with installing the package pecl install ev
, which similarly saves the situation. In a production environment with 600+ connections, the load on the process has decreased. There is one running artisan reverb hanging in the artisan reverb:start
it is necessary to specify in php.ini extension=ev.so
When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem
When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem
Did you install the ev extension?
When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem
In my case, the problem was solved after installing the ev extension. Currently, more than 2,500 users connect without problems
I faced the same problem. I installed it on Forge Laravel using load balance 3 server All servers have 100% cpu usage. Is there any solution related to forge?
Reverb Version
v1.0.0-beta4
Laravel Version
11.0.8
PHP Version
8.2.17
Description
After a day or so, I'm getting 100% CPU usage on the reverb:start process. If I restart reverb, it goes down to 1% CPU.
It's currently doing 160 odd users online in presence channel and that's about it. Had the same problem with this package: https://github.com/beyondcode/laravel-websockets/issues/379
Steps To Reproduce
Have a reverb server running via
reverb:start
for a day with many presence connections (150+).