Redis already provides a good PubSub architecture that should scale if needed.
We'll use a redis cluster, even though it will only have one node at the beginning. By using a redis cluster, we'll be able to scale the solution by adding more nodes if needed.
Redis' PubSub uses the concept of "channels". Multiple clients can subscribe to a specific channel; when you publish to that channel, all subscribers will receive the message. This has some limitations to take into account, specially if we want to focus on scalability:
While you're publishing, redis will need to forward the message to all the subscribers. Since redis is single-threaded, if there are too many subscribers, redis could slow down too much.
Each subscriber will need an opened connection. There are connection limits imposed by redis, so if everyone is connected to the same redis server, we might hit those limits.
In order to solve those problems, each user will have his own channel. In practice, subscription will be something like subscribe <userId> in redis. By using sharding, it's expected that the users will be spread more or less evenly in the redis cluster. This has the following advantages:
The number of connections per node is reduced.
The throughoutput is expected to improve since the load will be distributed among the nodes, and you might not need to wait for a message to be delivered to send the next one (if the message is sent via a different node)
The number of subscribers per channel is limited. Up to 20 subscribers per channel are expected, so publishing and delivering the message should be fast.
oCIS PubSub service
The main goal is to provide a websocket interface to the redis cluster.
Only registered users will be able to connect via websocket. This way, the user will be subscribed to his own channel in redis.
Some additional APIs can be included in order to publish messages. Some examples:
CLI message publishing to particular users / channels
CLI message publishing to all the users (we'd likey need to get the user list from somewhere)
HTTP / websocket publishing
Per-user configuration (if any)
Rate limiting / message deduplication
Connection with the go-micro framework could be optional if we plan to use this for OC10 (which I think it's unlikely)
The service itself should be scalable because the heavy-lifting is done in redis. There isn't any need for persistence, and the processing is expected to be very light (just websocket connection handling, and sending and retrieving the messages from redis).
Connection with the go-micro framework
This might become problematic. We're expecting a websocket connection from the web (and other places) which should go through the go-micro framework. This means that there could be hundred thousands of opened connections through the go-micro, so it could become a bottleneck.
My main worry is that all the connections MUST go through the proxy service and the go-micro framework before reaching the oCIS PubSub service (or any of its replicas)
As alternative, we could expose the oCIS PubSub service (and replicas) to the outside. The workflow could be the following:
Server responds with another endpoint where the client should send the websocket request, such as "ws://[server2]:[port2]/". This could also act as load balancer somehow since the next request could send the client to "ws://[server3]:[port3]/" instead based on the load.
Client will open a websocket to where the server has told.
The advantages for this approach are:
The websocket won't affect the general workflow because they're expected to be in different servers. We're expecting either async or one-way communication between the oCIS components and the oCIS PubSub service.
Clients can bypass oCIS. Once the client knows where to connect, it doesn't need to ask again to oCIS. This could be interesting if the client needs to reconnect to the websocket. There could be cases where the client will still need to ask oCIS for a different websocket (if the previous one is down, for example).
Known limitations
Due to how the redis pubsub architecture is designed, it will have some limitations to take into account. These limitations will affect the use cases
Messages can be lost. The message is sent once, and there is no guarantee it will reach all the clients.
There is no persistence for the messages (at least in redis). Messages will be lost if redis is restarted or crashes.
It requires a persistent connection to the client. If a client isn't connected when the message is sent, that client won't receive the message even if the client connects a few seconds later.
Messages are published against one specific channel. This means that you must know the channel beforehand (listing the channels might be slow, or have additional limitations in a redis cluster), and also you must perform multiple requests to publish against multiple channels.
Expected use cases
Real-time folder updates
A workflow could be the following:
The web client is showing the contents of the "/folder1/folder2" folder. It doesn't matter if it's in a space or is the home folder of the user
From a different client, such as the desktop client, the user adds a new file inside that folder.
The oCIS server adds the file into the FS and sends a message through the web socket.
The web client updates the UI with the new added file as soon as it gets the message.
Using spaces makes it more interesting. Knowing the list of users that have access to the space, we can send a message to all those users. This means that, if user1 adds a file into a space, the rest of the users could be notified and have their UIs updated.
This also have some limitations:
We aren't expected to update the information of all the folders, but just the folder the user is seeing. This means that there will be a lot of traffic that the clients will ignore. This might be another reason not to integrate the solution with go-micro, at least directly.
We might need to compute and / or send the added or removed size of the folder if the change happens in a deeper folder than the one the user is seeing. If a user is in the "/f1/f2" folder and a file is added in "/f1/f2/f3/f4", the web UI could show the updated size and the updated modification time of the "/f1/f2" folder even though the added file won't be shown.
One-shot live notifications
The server can send notifications to particular users. In particular, it can send notifications about planned downtimes.
Admin plans a downtime at 17:00.
Server starts sending notifications at specific intervals. For example, 1 hour, 30 minutes, 10 minutes, 5 minutes, 1 minute, 30 seconds and 10 seconds.
With the last notification, the server also sends a shutdown advice for the clients, so they can stop sending requests to the server. The clients can also has a warning message showing the server will be down from 17:00 to 19:00, for example.
Known use cases that can't be implemented
Persistent notifications
As said, redis doesn't provide persistence. If the server sends a notification and the client isn't connected, the notification will be lost for that client. This also include cases where the network isn't stable, or the client has to be restarted.
Critical notifications such as "you have to update due to security issues" can be missed. You also won't be able to ensure that the message will be delivered to a particular user.
Remote control / remote actions
This could be interesting, for example, if the admin wants to delete confidential data from client devices. If he file is confidential, you want to delete the file from all the devices, including possible caches or temporary locations.
The admin could also request to remove the account and all the files from the device if the user is removed from the server.
Anyway, this can't be implemented because we're requiring the user to be connected, which isn't guaranteed, so it's easy that such critical request is lost. This is too unreliable for the current model.
oCIS PubSub
Redis as core component.
Redis already provides a good PubSub architecture that should scale if needed.
We'll use a redis cluster, even though it will only have one node at the beginning. By using a redis cluster, we'll be able to scale the solution by adding more nodes if needed.
Redis' PubSub uses the concept of "channels". Multiple clients can subscribe to a specific channel; when you publish to that channel, all subscribers will receive the message. This has some limitations to take into account, specially if we want to focus on scalability:
In order to solve those problems, each user will have his own channel. In practice, subscription will be something like
subscribe <userId>
in redis. By using sharding, it's expected that the users will be spread more or less evenly in the redis cluster. This has the following advantages:oCIS PubSub service
The main goal is to provide a websocket interface to the redis cluster.
Only registered users will be able to connect via websocket. This way, the user will be subscribed to his own channel in redis.
Some additional APIs can be included in order to publish messages. Some examples:
Connection with the go-micro framework could be optional if we plan to use this for OC10 (which I think it's unlikely)
The service itself should be scalable because the heavy-lifting is done in redis. There isn't any need for persistence, and the processing is expected to be very light (just websocket connection handling, and sending and retrieving the messages from redis).
Connection with the go-micro framework
This might become problematic. We're expecting a websocket connection from the web (and other places) which should go through the go-micro framework. This means that there could be hundred thousands of opened connections through the go-micro, so it could become a bottleneck. My main worry is that all the connections MUST go through the proxy service and the go-micro framework before reaching the oCIS PubSub service (or any of its replicas)
As alternative, we could expose the oCIS PubSub service (and replicas) to the outside. The workflow could be the following:
The advantages for this approach are:
Known limitations
Due to how the redis pubsub architecture is designed, it will have some limitations to take into account. These limitations will affect the use cases
Expected use cases
Real-time folder updates
A workflow could be the following:
Using spaces makes it more interesting. Knowing the list of users that have access to the space, we can send a message to all those users. This means that, if user1 adds a file into a space, the rest of the users could be notified and have their UIs updated.
This also have some limitations:
One-shot live notifications
The server can send notifications to particular users. In particular, it can send notifications about planned downtimes.
Known use cases that can't be implemented
Persistent notifications
As said, redis doesn't provide persistence. If the server sends a notification and the client isn't connected, the notification will be lost for that client. This also include cases where the network isn't stable, or the client has to be restarted.
Critical notifications such as "you have to update due to security issues" can be missed. You also won't be able to ensure that the message will be delivered to a particular user.
Remote control / remote actions
This could be interesting, for example, if the admin wants to delete confidential data from client devices. If he file is confidential, you want to delete the file from all the devices, including possible caches or temporary locations. The admin could also request to remove the account and all the files from the device if the user is removed from the server.
Anyway, this can't be implemented because we're requiring the user to be connected, which isn't guaranteed, so it's easy that such critical request is lost. This is too unreliable for the current model.