RFC: Feedback on new websocket/channels/pubsub implementation

chrismccord commented 10 years ago

Hey All!

I'm really excited to unveil my plans for the websocket/pubsub layer I've been envisioning for Phoenix. I have an initial working implementation and I'm looking for direct feedback on the api, naming/concepts, and conventions before I merge with master. Here's what we have today:

Mutiplexed websocket connection with `phoenix.js` dep

A single websocket connection from the client is shared for all subscribed channels. This is required since browsers limit simultaneous websocket connections. A light javascript library provides socket connection, channel subscription, event binding, etc.

Pubsub layer that uses process groups backed by `:pg2`

I don't have hard benchmarks yet, but pg2 seemed ideal to built pubsub on top of since we get process groups distributed on the mesh for free. This comes with the caveat of potential overhead of nodes being locked for consistency, but I'd like to see how this scales out. Initially stress tests are very promising.

Channels

Channels can be thought of as namespaces for "topics" to be subscribed to, as well as a place to group common events and behavior. You can almost conceptually think of them as a typical controller action, except they handle events from clients, and broadcast events to other channels.

Topics

Topics are simply a string for clients to subscribe/broadcast about on a particular channel. This could be anything from the string "lobby" on a "rooms" channel, or a particular user id, for a "profiles" channel.

Let's write a bare-bones chat application using these ideas:

Router: We need a `rooms` Channel to broadcast on:

defmodule Chat.Router do
  use Phoenix.Router
  use Phoenix.Router.Socket, mount: "/ws"
  plug Plug.Static, at: "/static", from: :chat

  get "/", Chat.Controllers.Pages, :index, as: :page
  channel "rooms", Chat.Channels.Rooms
end

Channel: Let's authorize clients to join channel topics and pub/sub events

defmodule Chat.Channels.Rooms do
  use Phoenix.Channel

  @doc """
  Authorize socket to join for sub/pub events on this channel & topic

  Possible Return Values

  {:ok, socket} to authorize sub for channel for requested topic

  {:error, socket, reason} to deny sub/pub on this channel
  for the requested topic
  """
  def join(socket, message) do
    reply socket, "join", status: "connected"
    broadcast socket, "user:entered", username: "anonymous"
    {:ok, socket}
  end

  def event("new:message", socket, message) do
    broadcast socket, "new:message", message
    {:ok, socket}
  end
end

Markup/javascript:

<h1>Phoenix Chat Example</h1>
<h3>Status: <small id="status">Not Connected</small></h3>
<div id="messages"></div>
<input id="message-input" type="text">

<script type="text/javascript">
  $(function(){
    var socket    = new Phoenix.Socket("ws://" + location.host +  "/ws");
    var $status   = $("#status");
    var $messages = $("#messages");
    var $input    = $("#message-input");

    socket.join("rooms", "lobby", {}, function(chan){

      $input.off("keypress").on("keypress", function(e) {
        if (e.keyCode == 13) {
          chan.send("new:message", {body: $input.val()});
          $input.val("");
        }
      });

      chan.on("join", function(message){ $status.text("joined"); });

      chan.on("new:message", function(message){
        $messages.append("<br/>" + message.body);
      });

      chan.on("user:entered", function(msg){
        $messages.append("<br/><i>[" + msg.username + " entered]</i>");
      });
    });
  });
</script>

The neat thing about channels is we can broadcast to them from anywhere in our application. Consider the chat example with active clients in the "rooms" channel, "lobby" topic. We could push a message to the browser directly from iex:

iex> Phoenix.Channel.broadcast("rooms", "lobby", "new:message", body: "hello!")
:ok

This opens up all kinds of realtime updates from any parts of the application to subscribed clients.

That's it! I believe this is flexible enough to accommodate most use-cases and extensible enough for us to iterate on for more advanced features.

Here's the chat example packaged as a little app: https://github.com/chrismccord/phoenix_chat_example

cm-channels branch: https://github.com/phoenixframework/phoenix/tree/cm-channels

Big thanks to @jeregrine for doing the legwork on the websocket handler and @HashNuke for helping me flesh out the channel/topic concepts.

Let me know how it looks!

HashNuke commented 10 years ago

:+1: This is great news.

@chrismccord you are being too generous with crediting me. All work is done by you and @jeregrine :)

Suggestions:

Both phoenix.coffee and phoenix.js could be in some extras dir (placeholder name). So when someone needs to use either the coffee or js version, they can copy it to where they want (I'm assuming people use their own asset pipelines for now).

tallakt commented 10 years ago

There are some things that I don't quite get here.

The "lobby" - the lobby is not referenced in the server code. I guess the lobby is an instance of the room, but I would expect the server to control which users would communicate with whom. For instance, a server handling different nationalities would perhaps make a norwegian lobby and a swedish lobby and separate traffic. That kind of functionality would not necessarily be correctly written in the client.
the join method - is join the connect statement here? I would expect to connect to a socket, not join it. Or is there some kind of method_missing magic giong on that translates join to the join function in the Rooms module?

;-)

HashNuke commented 10 years ago

@tallakt You could read this as "connect to websocket & then join multiple channels you want". In this case the channel is "rooms", and the topic is "lobby".

EDIT: @tallakt "lobby" is an example channel. The example app is pretty simple. In real world, you might want to limit who can subscribe to which channels. That will then be have to done in the server side in the websocket handler.

@chrismccord I think the vocabulary needs to be simplified.

tallakt commented 10 years ago

So I guess to extract the channel name, you would have to query the socket record...

chrismccord commented 10 years ago

@tallakt I could have been more clear in the example:

1) The socket carries the channel and topic it is trying to join as fields and the 2nd argument of join, (message), also is whatever the client passed up and can be used for other required authorization logic. It's the third {} arg in the socket.join js function:

socket.join("rooms", "lobby", {room_token: "abc"}, function(chan){...

  def join(socket, message) do
    if MyMod.can_access_room?(socket.topic, message[:room_token]) do
      reply socket, "join", status: "You just jointed #{socket.channel}:#{socket.topic}"
      broadcast socket, "user:entered", username: "anonymous"
      {:ok, socket}
    else
      {:error, socket, :unauthorized}
    end
  end

Client's can join any number of channels/topics they want, but need to request access via join. For the "norwegian" lobby, you'd need to have the client socket.join("rooms", "norwegian", {}, function(chan){..., .

2) join authorizes the socket to subscribe to the requested channel/topic to both receive and publish events on that topic. So instead of connecting to the socket, the socket is already connected to the server, via new Phoenix.Socket("ws://" + location.host + "/ws");, and the client asks to "join" the channel, for given topics.

Let me know if that clears things up. It's late, so I'm sure my vocab could be clearer :)

tallakt commented 10 years ago

Thanks for clearing that out. Also thanks for Phoenix. This is very interesting to me as elixir is soft realtime and elixir could be used for realtime control at the same time it is serving pages with websockets using the same language all the way (except js of course). I did find out how the code works by studying the library code a bit, and I guess the mature implementation would have some more documentation and examples also :)

knewter commented 10 years ago

I really like it. For the realtime financial trading app we built, this is very nearly the exact api we built out in Celluloid (ruby). I think it's great.

One thing - will you be providing an on("*") style event in the phoenix.js library? I ask this because we built a websocket message router in angular.js and it's entirely possible we'd want to do something similar - I'm not 100% sure on that, but it's possible that I don't want to have to specify my per-message logic in separate calls to on but I'd still want to use the rest of the websocket lib probably?

That's my only question...

chrismccord commented 10 years ago

Thanks for the feedback @knewter. Providing an on("*") style hook is definitely doable. I would likely add a single callback to register since you wouldn't need to have multiple triggers for a case like this. Something like:

socket.join("foo", "bar", {}, function(chan){
  chan.onEventReceived = function(event, msg){ /* your router logic */ });
});

jeregrine commented 10 years ago

The protocol is pretty simple Json. Could very easily replicate phoniex.js without much trouble.

On Thu, Apr 3, 2014 at 9:13 PM, Josh Adams notifications@github.com wrote:

I really like it. For the realtime financial trading app we built, this is very nearly the exact api we built out in Celluloid (ruby). I think it's great. One thing - will you be providing an on("*") style event in the phoenix.js library? I ask this because we built a websocket message router in angular.js and it's entirely possible we'd want to do something similar - I'm not 100% sure on that, but it's possible that I don't want to have to specify my per-message logic in separate calls to on but I'd still want to use the rest of the websocket lib probably?

That's my only question...

Reply to this email directly or view it on GitHub: https://github.com/phoenixframework/phoenix/issues/70#issuecomment-39525879

knewter commented 10 years ago

yup, I can do that, as I've done in the past. However, I'm expecting that over time the abstraction might grow, and so if I'm building a phoenix app I might as well live in the abstraction as much as possible. Here I'm treating Phoenix a bit more like a 'framework' I suppose :)

On Thu, Apr 3, 2014 at 9:28 PM, Jason S. notifications@github.com wrote:

The protocol is pretty simple Json. Could very easily replicate phoniex.js without much trouble.

On Thu, Apr 3, 2014 at 9:13 PM, Josh Adams notifications@github.com wrote:

I really like it. For the realtime financial trading app we built, this is very nearly the exact api we built out in Celluloid (ruby). I think it's great. One thing - will you be providing an on("*") style event in the phoenix.js library? I ask this because we built a websocket message router in angular.js and it's entirely possible we'd want to do something similar

I'm not 100% sure on that, but it's possible that I don't want to have to specify my per-message logic in separate calls to on but I'd still want to use the rest of the websocket lib probably? That's my only question...

Reply to this email directly or view it on GitHub:

https://github.com/phoenixframework/phoenix/issues/70#issuecomment-39525879

Reply to this email directly or view it on GitHubhttps://github.com/phoenixframework/phoenix/issues/70#issuecomment-39526520 .

Josh Adams CTO | isotope|eleven http://www.isotope11.com cell 215-3957 work 476-8671 x201

raycmorgan commented 10 years ago

By using pg2 is this able to scale broadcasts across connections on different erlang nodes?

HashNuke commented 10 years ago

I don't think sticky sessions is a problem for Phoenix to solve. Although if done, it would be nice.

On Fri, Apr 4, 2014 at 12:20 PM, Ray Morgan notifications@github.comwrote:

By using pg2 is this able to scale broadcasts across connections on different erlang nodes?

Reply to this email directly or view it on GitHubhttps://github.com/phoenixframework/phoenix/issues/70#issuecomment-39536447 .

chrismccord commented 10 years ago

@knewter I plan to grow the abstraction as patterns emerge. Can you provide a pseudo javascript api that you'd like to see?

@raycmorgan @HashNuke Yes, pg2 indeed allows us to broadcast across connections on different erlang nodes :). It makes sticky sessions a non-issue since the process groups are distributed on the mesh. If a socket goes down and reconnects to a different load-balanced Router, it won't matter as long as they are the same mesh.

HashNuke commented 10 years ago

Woow cool. Now where do I read more about running phoenix on multiple nodes? On Apr 4, 2014 5:06 PM, "Chris McCord" notifications@github.com wrote:

@knewter https://github.com/knewter I plan to grow the abstraction as patterns emerge. Can you provide a pseudo javascript api that you'd like to see?

@raycmorgan https://github.com/raycmorgan @HashNukehttps://github.com/HashNukeYes, pg2 indeed allows us to broadcast across connections on different erlang nodes :). It makes sticky sessions a non-issue since the process groups are distributed on the mesh. If a socket goes down and reconnects to a different load-balanced Router, it won't matter as long as they are the same mesh.

Reply to this email directly or view it on GitHubhttps://github.com/phoenixframework/phoenix/issues/70#issuecomment-39555879 .

raycmorgan commented 10 years ago

Awesome! That's what I was hoping and expected, but wanted to be sure. The broadcast call has bitten people that use it in node land because of the difference in single node v cluster.

@HashNuke you should look into distributed erlang. Here is fun read about it http://learnyousomeerlang.com/distribunomicon

-Ray

On Fri, Apr 4, 2014 at 4:38 AM, Akash Manohar notifications@github.com wrote:

Woow cool. Now where do I read more about running phoenix on multiple nodes? On Apr 4, 2014 5:06 PM, "Chris McCord" notifications@github.com wrote:

@knewter https://github.com/knewter I plan to grow the abstraction as patterns emerge. Can you provide a pseudo javascript api that you'd like to see?

@raycmorgan https://github.com/raycmorgan @HashNukehttps://github.com/HashNukeYes, pg2 indeed allows us to broadcast across connections on different erlang nodes :). It makes sticky sessions a non-issue since the process groups are distributed on the mesh. If a socket goes down and reconnects to a different load-balanced Router, it won't matter as long as they are the same mesh.

Reply to this email directly or view it on GitHubhttps://github.com/phoenixframework/phoenix/issues/70#issuecomment-39555879 .

Reply to this email directly or view it on GitHub: https://github.com/phoenixframework/phoenix/issues/70#issuecomment-39556046

darkofabijan commented 10 years ago

You have made a very powerful thing for us to play with! :) Implementation code is as always very nice.

I thought I got API completely at first glance, but later I spent some time fighting with topic concept. I think it was mostly due to lack of example that would demonstrate the use-case.

Let me try to explain it to myself.

It’s clear that we want to multiplex messages on one socket due to browser limitations. (It will probably also save some resources on the server side.) To overcome that we multiplex channels on that one socket. One way to represent channels, topics and messages is the following:

WebSocket0:Channel0:Topic0:Message0
WebSocket0:Channel1:Topic1:Message1
...
WebSocket0:ChannelN:TopicN:MessageN

I am planning to use this notation to explain point about topics.

To get the work done in a user friendly manner with channels and messages it's easies to just specify them as constants in the code as it was done in the example application. On the other side it's easy to specify topics at runtime with current api. Following example of previously established notations represent that:

WebSocket0:Channel0:*:Message0

It's easy to this at runtime:

WebSocket0:Channel0:Topic1:MessageX
WebSocket0:Channel0:Topic2:MessageX
...
WebSocket0:Channel0:TopicN:MessageX

That being said it seems to me that topics are not needed for the classes of problems such as demo application and probably also many other simple use cases. So it would be nice if they are kind of optional. Underlaying implementation can still rely on them but it would be nice if we don't have to specify them if they are not needed for our use case. Previous discussion and comment by @tallakt also gives good example that it would be nice to omit topics in some cases. Topic "lobby" is specified only in JS code, and topic is not touched in Elixir code or logically needed or used which is kind of a smell for API.

It would be nice if JS code could look something like this:

    socket.join("rooms", {}, function(channel){

      $input.off("keypress").on("keypress", function(e) {
        if (e.keyCode == 13) {
          channel.send("new:message", {body: $input.val()});
          $input.val("");
        }
      });

    });

And in case that we want to leverage the power of topics (one more nesting level) following code would be nice. (Note that topic is given to the callback function, and send is called on topic.)

    socket.join("rooms", currentTopic, {}, function(topic){

      $input.off("keypress").on("keypress", function(e) {
        if (e.keyCode == 13) {
          topic.send("new:message", {body: $input.val()});
          $input.val("");
        }
      });

    });

Btw it's nice that we already have broadcast functions with and without topic.

Phoenix.Channel.broadcast("rooms", "lobby", "new:message", body: "hello!")
Phoenix.Channel.broadcast("rooms", "new:message", body: "hello!")

To show an example where topics could be really useful I came up with the following example:

Example application: Server monitoring app

We are building server monitoring application in Phoenix. Each server that we are monitoring streams log files via RabbitMQ to server which is running Phoenix app. Responsibility of Phoenix app is to then stream received data to users via web socket.

Monitoring application has page for each server and we can configure to stream multiple files. Messages are just appended to text areas.

File1: /var/log/syslog => textarea#syslog
File2: /opt/nginx/logs/access.log => textarea#access-log
…
FileN: /path/to/.../fileN => textarea#filen

And following message routing scheme could be used:

WebSocket:logs-channel:sylog-topic:new-line
WebSocket:logs-channel:access-log-topic:new-line
WebSocket:logs-channel:filen-topic:new-line

The question for this example that is open is how we could DRY JS implementation.

TL;DR Topics are great but API could be simpler if they would be optional. It would be also one thing less to think/learn about for people getting started with Phoenix. With great documentation and examples people who need additional layer of message routing would definitely find them helpful. (Like topics in queuing protocols maybe some day topics could support some patterns such as server1.logfiles.*)

Thanks for making sockets so easy to use and for reading this :)

chrismccord commented 10 years ago

Thanks for the feedback @darkofabijan. You raised some excellent points:

At risk of not saying anything new, let me see if I can better explain my rationale for topics being required. If you still feel they should be optional, please say so since it's something that wouldn't take a great deal to support:

I too struggled with whether topics should always be required over just channels for some cases. What made me decide they should be a required piece of the Phoenix.Message protocol is my use-cases have always "exploited" channels to make them unique by doing things like embedding resource data as part of the channel itself. A lot of websocket apis only support subscribing on a single channel string, so javascript code ends up looking something like:

pusher.com API example

var channel = pusher.subscribe('rooms:' + myCompanyRoomId);
channel.bind('new:message', function(data) {
  alert('new message in room ' + myCompanyRoomId + ':' + data.message);
});

Here with just the channel concept, we have to concatenate information into the channel itself to pubsub on unique channels for the given "room" abstraction. If Phoenix only provided the channel concept, we'd need a convention to parse part of the channel to match to a Phoenix.Channel, then pass/require the client to parse their concatenated resource info. Topics aim to resolve this arbitrary concatenation with channels + topics concepts.

Channel - A place to scope events and application behavior Topic - A "thing" to subscribe and publish about on a channel (typically a resource/model)

In certain cases, topics do feel unnecessary and the current implementation could see people having to make "global" topics for cases where they really don't care about a topic. i.e. Let's consider a feed of app-wide new user signups:

socket.join("signups", "global", function(channel){ ...

This does feel unnecessary. The question is how often will these particular use-cases crop up? I feel like the vast majority of case though will usually have a particular resource that the client is pubsub'ing on. By requiring Channel + Topic, we also have a consistent API for Channel.broadcast, Phoenix.Message, and other third part clients that could be written against a Phoenix websocket backend.

TLDR;

There are clear use-cases I see for topics being unnecessary, but how often? Is it worth the implementation to support optional topics or require redundant topics for some edge cases? (or am I wrong and these aren't edge cases but common use-cases?)

Having said all this, the nice thing is I think @darkofabijan is right that all we'd need to do is provide an optional topic argument on socket.join in phoenix.js, and the backend implementation would remain unaffected as long as the client passed up a null topic: key.

Just to be clear, I haven't made a final decision yet on optional topics. Getting the abstractions right early is super important, so let me sleep on this one :)

chrismccord commented 10 years ago

One correction @darkofabijan , Channel.broadcast always requires the channel + topic, broadcast/3's signature is

iex> Channel.broadcast socket, "new:message", id: 1, content: "hello"

The channel and topic used in this case is implicit and uses the currently multiplexed socket.channel/socket.topic.

chrismccord commented 10 years ago

This branch just landed in master and I'll be pushing version 0.1.1 to hex once I get the docs updated. @darkofabijan I'm going to hold off on optional topics until their use (or lack there of) comes up in the wild in real applications. It won't be difficult to make them optional later and I'd like to see how far we can get with great getting started docs and real examples. Thanks everyone for the feedback!

fishcakez commented 10 years ago

Did you consider using ;gproc and its bcast function? That seems the more obvious choice for distributed pubsub? Perhaps you were not aware of it.

chrismccord commented 10 years ago

@fishcakez :gproc was a consideration (and still is for the future). Ultimately, going with :pg2 lets us use the standard lib and be free of another dependency. If performance issues arise later with pg2, we can revisit gproc. Thanks for the heads up

fishcakez commented 10 years ago

@chrismccord ok, makes sense. You need to look at how you are cleaning up :pg2 groups.

It is possible that a group will be deleted when it has member(s) due to a race condition. The cleanup gen_server could see no members and try to delete the group, however between the lookup and the delete a process on any node (including the local one) could join the group. The process that joined will have no idea that it isn't a member of the group anymore because the group as been deleted.

It also possible to get zombie empty groups if the node that created a group stops or crashes. When the node restarts the group will still exist on the cluster but no cleanup will be done once it becomes empty.

The cleanup server is also not part of the supervision tree, this is very bad practise and breaks OTP conventions.

phoenixframework / phoenix