How scalable is the redis pub/sub approach?

marialovesbeans commented 1 year ago

When reading the readme, if I'm not wrong, Redis broadcasts to ALL node processes for EVERY room message. I wonder how scalable/efficient this approach is with a large number of node processes and rooms.

Let's say a common scenario - in our case, we're building a chat app where users can have group conversations with each other (like whatsapp/fb messenger), and each chat group could have up to 500 people.

Say I have 32 node processes running socket.io-redis, from what I understand, all these processes are subscribed to redis. Now I send a particular message to a room called user via io.to("user1").emit("hello"). This will prompt Redis to send a message to ALL 32 node processes asking which process(es) contain the room "user1", and send the message "hello" to the corresponding sockets in that room.

In our design, when a user connects, he joins a room named after his own user ID, which is why the above room is called user1, basically this is how we send a message to a particular user.

So, back to the problem. In a chat group of 500 users, a user sends "hello", db fetches the userIDs of all those users, and for each user Redis will ask ALL 32 node processes if they have that user's room. So in total 500 * 32 commands being issued, instead of only 500 commands. Can this lead to scalability issues? i.e. the queries got amplified by the number of nodejs processes.

Therefore, using this package, should we limit the number of nodejs processes? i.e. instead of having 32 processes each connecting to a small number of sockets (each occupying a small memory, like 1G of RAM), we should have much fewer processes (say 4), but each can take a large amount of memory, like 8G of RAM (thus can contain a large number of sockets)? But of course this needs to flag node using --max-old-space-size= 8192 for example.

I wonder if anyone (or the maintainers) have suggestion for chat apps running with a large number of concurrently online users (1+ million)? Please let me know if I'm making any sense. Thanks!

nitish076 commented 1 year ago

you can try clustering with roomId as hashkey, this would limit your broadcast within that cluster.

ie- you will have three level approach here. RoomGroup -> Room -> sessionId.

darrachequesne commented 1 year ago

Hi! That's a great question :+1:

In a chat group of 500 users, a user sends "hello", db fetches the userIDs of all those users, and for each user Redis will ask ALL 32 node processes if they have that user's room.

You could also assign a room to this chat group, so that you can simply call io.to("the-chat-group").emit("hello") instead of looping over each user ID.

Besides, a new adapter based on Redis sharded PUB/SUB (requires Redis v7) has been added in version 8.2.0.

More information here: https://redis.io/docs/interact/pubsub/#sharded-pubsub

There are two subscription modes:

"static": 2 Redis channels per namespace

Useful when used with dynamic namespaces.

"dynamic": (2 + 1 per public room) Redis channels per namespace

The default value, useful when some rooms have a low number of clients (so only a few Socket.IO servers are notified).

marialovesbeans commented 1 year ago

Hi @darrachequesne, instead of broadcasting to all node processes for every room message, we could optimize this by having each node process subscribe to the Redis channels that correspond to the rooms they have active socket connections for. This way, we can minimize unnecessary queries across all node processes and make the broadcasting more efficient.

Essentially:

When a socket joins a room: The corresponding node process subscribes to a specific Redis channel for that room.
When a message is sent to a room: The message is published to the Redis channel for that room.
Handling the message: Only the node processes that are subscribed to that particular room's channel will receive the message from Redis and can forward it to the connected sockets.

This approach ensures that messages are only sent to the node processes that actually contain the targeted room, thus reducing the number of commands and improving scalability.

Would you consider this a viable solution for the scenario I described with a large number of nodes? Also if my understanding about the current way this package is incorrect, please let me know! Really appreciate your thoughts on this, and thank you for considering.

darrachequesne commented 1 year ago

@marialovesbeans what you are describing is exactly how the new adapter works:

import { Server } from 'socket.io';
import { createClient } from 'redis';
import { createShardedAdapter } from '@socket.io/redis-adapter';

const pubClient = createClient({ host: 'localhost', port: 6379 });
const subClient = pubClient.duplicate();

await Promise.all([
  pubClient.connect(),
  subClient.connect()
]);

const io = new Server({
  adapter: createShardedAdapter(pubClient, subClient, {
    subscriptionMode: "dynamic"
  })
});

io.listen(3000);

Thanks to the sharded PUB/SUB, the message are only forwarded to the right Redis nodes.

We will update the documentation on the website to make it clearer. It is not backward compatible with the previous implementation though.

marialovesbeans commented 1 year ago

Hi @darrachequesne, thanks for the quick response! Would this work for a simple single-instance Redis (non cluster mode)?

darrachequesne commented 1 year ago

This doesn't currently work in standalone mode, but we could indeed add this subscriptionMode: dynamic for the classic adapter. Let me get back to you.

estradino commented 1 year ago

Hi @darrachequesne - We were wondering whether this has been applied to the classic adapter (single instance, non-cluster mode)?

cody-evaluate commented 12 months ago

This doesn't currently work in standalone mode, but we could indeed add this subscriptionMode: dynamic for the classic adapter. Let me get back to you.

this would be huge

socketio / socket.io-redis-adapter

How scalable is the redis pub/sub approach? #510