laravel / reverb

Laravel Reverb provides a real-time WebSocket communication backend for Laravel applications.
https://reverb.laravel.com
MIT License
1.05k stars 76 forks source link

High CPU usage #116

Open bbashy opened 5 months ago

bbashy commented 5 months ago

Reverb Version

v1.0.0-beta4

Laravel Version

11.0.8

PHP Version

8.2.17

Description

After a day or so, I'm getting 100% CPU usage on the reverb:start process. If I restart reverb, it goes down to 1% CPU.

It's currently doing 160 odd users online in presence channel and that's about it. Had the same problem with this package: https://github.com/beyondcode/laravel-websockets/issues/379

Screenshot 2024-03-23 at 7 49 30 PM

Steps To Reproduce

Have a reverb server running via reverb:start for a day with many presence connections (150+).

joedixon commented 5 months ago

Hey @bbashy, how many messages are you sending across the connections and what sort of server are you running?

Also, are you running Reverb alongside another an application or standalone?

bbashy commented 5 months ago

Hi.

Using it as a presence channel for online user list. Hit 180 users at certain times. It is used on most pages across the site, so people will be leaving/joining on page change. People visit a few pages every few seconds.

Using a Hetzner cloud server AMD EPYC (8c 16GB).

No other applications, this server is just for this site. Was using Soketi (no CPU issues) but it doesn't work on node v20, which is why I switched to Reverb after Laravel upgrade.

Echo.join('online-users')
    .here((users) => {
        this.onlineUsers = users;
    })
    .joining((user) => {
        this.onlineUsers.push(user);
    })
    .leaving((user) => {
        this.onlineUsers = this.onlineUsers.filter(u => u.id !== user.id);
    });

In channels.php it's just a return of user data. Username, country, link to profile.

It's been fine since I deployed last - 24/03/2024 22:00 UTC.

joedixon commented 5 months ago

Thanks for the reply. Going to close this for now, but please mention me in this thread if you see the issue again and I will reopen.

bbashy commented 5 months ago

@joedixon Started happening again. If there's any debugging I can do, let me know.

Screenshot 2024-03-27 at 9 12 33 PM
joedixon commented 5 months ago

@bbashy Would be good to get a better picture of your environment and use case. Would you mind answering the following:

  1. Does the CPU continually increase over time or does it suddenly spike to 100%?
  2. If you leave it, does the CPU eventually drop back down?
  3. Are you using the Pulse integration?
  4. Do you have debug mode enabled?
  5. Are you broadcasting messages (if so, roughly how many) or just using for presence indication?
  6. Are you able to use tinker to run the code samplese below to get the current number of connections and channels:
// Connections
app(\Illuminate\Broadcasting\BroadcastManager::class)
  ->getPusher()
  ->get('/connections');

// Channels
app(\Illuminate\Broadcasting\BroadcastManager::class)
  ->getPusher()
  ->get('/channels', ['info' => 'subscription_count']);
bbashy commented 5 months ago
  1. I've only noticed it being 1% or 99-100%.
  2. It's still at that since my last reply (16 hours).
  3. No but I plan to once out of beta. Shall I install?
  4. No. Logs just show "INFO Stopping server on 0.0.0.0:2096. INFO Starting secure server on 0.0.0.0:2096 (example.com)." etc.
  5. We have a 'chat room' but it's hardly used.
  6. See below.
Psy Shell v0.12.2 (PHP 8.2.17 — cli) by Justin Hileman
> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/connections');
= {#5634
    +"connections": 117,
  }

> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/channels', ['info' => 'subscription_count']);
= {#5606
    +"channels": {#5612
      +"servers-booked": {#5610
        +"subscription_count": 14,
      },
      +"stats-counter": {#5605
        +"subscription_count": 14,
      },
      +"presence-public-chat": {#5607},
    },
  }
joedixon commented 5 months ago

No need to install Pulse, just wanted to rule it out as a factor.

Would be interesting to run with debug mode enabled for a little while (which in itself will increase CPU) to see if anything untoward is happening when the CPU spikes.

bbashy commented 5 months ago

Running with debug now and I'm running a script to gather the cpu % every 2 seconds.

Will report here with data.

bbashy commented 5 months ago

Gone up to 100% again, here's the cpu usage, pretty much instant, not gradual. Running in debug for 24 hours.

2024-04-01 20:33:46 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:33:48 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:50 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:52 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:54 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:56 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:58 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:00 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:02 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:04 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:06 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:08 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:10 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:12 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:14 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:16 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:18 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:20 :: reverb:start --debug [2282066] CPU Usage: 52%
2024-04-01 20:34:22 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:24 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:26 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:28 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:30 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:32 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:34 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:36 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:38 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:40 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:42 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:44 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:46 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:48 :: reverb:start --debug [2282066] CPU Usage: 100%

The reverb logs don't show much apart from maybe this? This is quite regular in the whole log even when it was low usage.

Message Handled ........................................ 285448861.320304079  
Connection Closed ...................................... 880659721.152382869  
Pruning Stale Connections ..................................................  
Pinging Inactive Connections ...............................................  
Connection Pinged ...................................... 373144331.352179112  
Connection Pinged ...................................... 770897394.824165909  
Connection Pinged ...................................... 128326687.479625382  
Connection Pinged ...................................... 238258499.280963052  
Connection Pinged ...................................... 711357617.733355114  
Connection Pinged ...................................... 913581555.461208734  
Connection Pinged ...................................... 750335547.483737098  
Connection Pinged ...................................... 824799629.748502904  
Connection Pinged ...................................... 373757589.569113713  
Connection Pinged ....................................... 88650420.659599424  
Connection Pinged ...................................... 875588527.671792201  
Connection Pinged ...................................... 645841583.993685622  
Connection Pinged ....................................... 69443263.770122343  
Connection Pinged ....................................... 969366386.39638130  
Connection Pinged ....................................... 97541188.835314754  
Connection Pinged ...................................... 336550205.253472927  
Connection Pinged ...................................... 837108525.856830003  
Connection Pinged ...................................... 843470688.187968972  
Connection Pinged ...................................... 903186882.544248154  
Connection Pinged ....................................... 70865328.772779885  

then after 170~ lines of that, it does loads of;

Message Received ....................................... 913581555.461208734  

1    {
2        "event": "pusher:ping",
3        "data": []
4    }

Message Handled ........................................ 913581555.461208734  
Message Received ......................................... 746315846.6165177  

1    {
2        "event": "pusher:ping",
3        "data": []
4    }
joedixon commented 5 months ago

@bbashy the logs look normal, but I'm interested to see if the issue is related to the pinging of connections which Reverb defaults to running every 60 seconds. Can you try increasing the value of REVERB_APP_PING_INTERVAL to see whether that makes any difference. Perhaps starting at 1 hour (3600) and increasing from there.

bbashy commented 5 months ago

I've set REVERB_APP_PING_INTERVAL=3600 and I've restarted reverb. Back to 0-1% CPU and will run the monitoring for the process.

Must be something to do with the amount of pings along with the connections subscribe/unsubscribe.

I use it as an online list and it's included on most pages, so page change will be a disconnect/connect.

joedixon commented 5 months ago

Right, but I would say that's pretty standard usage. Perhaps the difference is the online list as that will cause events to be broadcast across all connected clients on every connection / disconnection. Pusher actually limit this functionality to 100 connections only.

bbashy commented 5 months ago

Yep, I saw that and Soketi limit it too. I increased the limit on Soketi to 500 and the cpu usage was still below 10% even after a month of uptime.

The online list is only on the homepage though, other pages only have a counter, but I guess it still updates to all clients.

bbashy commented 5 months ago

Gone up again. Shall I try a higher ping interval? It's weird it goes straight up 0-100% in 4 seconds.

2024-04-04 19:28:00 :: reverb:start --debug [237336] CPU Usage: 2%
2024-04-04 19:28:02 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:04 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:06 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:08 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:10 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:12 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:14 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:16 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:18 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:20 :: reverb:start --debug [237336] CPU Usage: 87%
2024-04-04 19:28:22 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:24 :: reverb:start --debug [237336] CPU Usage: 99%
2024-04-04 19:28:26 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:28 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:30 :: reverb:start --debug [237336] CPU Usage: 100%

If it's not something others have had and we can't resolve, I'll look into changing the online user list to something ws message based.

joedixon commented 5 months ago

It would be good to try increasing the ping interval to see if it has any impact on when the CPU spikes. Agree it's strange that it ramps up so quickly.

Are you in a position to be able to temporarily disable the online user list so we can categorically determine that is actually the issue?

clue commented 5 months ago

@bbashy Sudden CPU spikes are definitely not to be expected and if you've encountered the exact same error in both the original Laravel Websockets package and now Reverb, this might in fact indicate a problem in the underlying Ratchet/ReactPHP components. Similar reports have been posted in the past and we have some ideas what could cause this, but we've never been able to reproduce this locally, see https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1098798274 / https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1101270289. If this matches your issue and you can reproduce this locally, please reach out and I'm happy to take a look at this.

bbashy commented 5 months ago

@clue Looks very similar and relates to my use-case of high connect/disconnect events. In theory, all we'd need to do is replicate a lot of users (100+) connecting/disconnecting. It started looking like it was because of that after I saw the time it shot up. 19:30-20:30 is when my site has the most users on it, and they'll be doing a lot of page changes (disconnect/connect).

Alex0Fury commented 3 months ago

@bbashy I faced a similar problem. Supervisor is raised under the docker container, the reverb process is started there. This process was loaded at 100%, 500+ connections at the same time.

The solution turned out to be to add an extension as indicated in the documentation Event Cycle

Dockerfile

FROM php:8.3.3-fpm-bullseye
#....
#....
RUN apt-get install openssl
RUN docker-php-ext-install sockets
RUN pecl channel-update pecl.php.net
RUN apt-get update && apt-get install -y libevent-dev

RUN printf "\n\n\n\n\nno\nyes\n\n" | pecl install event
RUN docker-php-ext-enable event

Also installed outside the host system ulimit -n 32766

in the supervisor settings supervisor.conf minfds=10000

[supervisord]
nodaemon=true
chmod=0770
logfile=/var/log/supervisor/supervisord.log
pidfile=/var/run/supervisord.pid
minfds=10000

[program:laravel-websocket]
numprocs=8
process_name=%(program_name)s_%(process_num)02d
chmod=0770
command=php /var/www/html/artisan reverb:start
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/www/html/storage/logs/reverb.log

finally

image

bbashy commented 3 months ago

Might have fixed an issue for you, but I believe mine was slightly different.

I have no problems when running it under Soketi on the same machine/ulimit.

Alex0Fury commented 3 months ago

finally

Возможно, я решил для вас проблему, но я думаю, что у меня было немного иначе.

I believe that our answers can help other developers to cope with such a problem. And the owners of the repository may take into account some of our practices

joedixon commented 3 months ago

This is still on my radar, I just haven't had the chance to take a deep dive on it just yet.

Alex0Fury commented 3 months ago

Это все еще в моем поле зрения, просто у меня еще не было возможности глубоко погрузиться в него.

Yesterday I experimented with installing the package pecl install ev, which similarly saves the situation. In a production environment with 600+ connections, the load on the process has decreased. There is one running artisan reverb hanging in the artisan reverb:start

it is necessary to specify in php.ini extension=ev.so

image

rawezhcode commented 3 months ago

image

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

sneycampos commented 1 month ago

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

Did you install the ev extension?

rawezhcode commented 1 month ago

image

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

In my case, the problem was solved after installing the ev extension. Currently, more than 2,500 users connect without problems

samehdoush commented 1 day ago

I faced the same problem. I installed it on Forge Laravel using load balance 3 server All servers have 100% cpu usage. Is there any solution related to forge?