igniterealtime / openfire-ofmeet-plugin

Provides an HTTP Online Meeting solution for Openfire using Jitsi Meet.
Apache License 2.0
47 stars 43 forks source link

How to make ofmeet work in a clustered Openfire? #67

Open vincentwlau opened 6 years ago

vincentwlau commented 6 years ago

I have Openfire 4.2.3, ofmeet and offocus 0.9.3 with two clustered nodes in a cloud environment. Each node has its local and external IP addresses. I would like to distribute the media streams among these two nodes. But in the Meetings->Media Configuration, it only allows one Local IP and Public IP. It restricts all media streams going into and coming out from one node. Is it how the plugins are supposed to work in a clustered Openfire environment?

I also noticed that when clustering was turned on, there were two "focus" client sessions online (they had two different client IP in the Sessions page):

  Name Name   Resource Node Status Presence Priority Client IP Close Connection
1 focus focus626264215260527 Local Authenticated     Online 0 xxx.227.173.yyy  
2 focus focus900138802722487 Local Authenticated     Online 0 xxx.230.18.zzz  

Looking at the log, the Focus Provider plugin in the 2nd node was started before it completed joining the cluster, so it did not detect that there was an online "focus" session already. And the "focus" session from the 1st node was not kicked out after joining the cluster. Maybe the "focus" user should log in by a node when clustering is disabled, or a node is promoted to a senior member and there is no other "focus" session. Is it a known issue?

deleolajide commented 6 years ago

Is it a known issue?

Yes indeed. We are now entering uncharted waters here. Simply said, ofmeet and offocus are not cluster ready. In the simplest configuration, offocus should not be activated unless it is running on a senior/prime node.

A possible way to make it work could be to install offocus on a chosen node. It will defeat the purpose of fail-over, but the single focus should detect the two Jitsi Videobridges (ofmeet) instances

vincentwlau commented 6 years ago

Thanks for the advice.

guusdk commented 6 years ago

@vincentwlau I'd be interested in learning from your experiences. It would be very interesting to add cluster support to ofmeet.

vincentwlau commented 6 years ago

Since each instance jitsi-videobridge (ofmeet) needs to specify a pair of local and public IP addresses (currently stored in DB) for NAT support, it may be hard for clustering. But if the local/public IP addresses are stored in a local configuration file (e.g. ofmeet.xml per node), each ofmeet instance can be used to spread the load. However, it is no longer clustering , but more like distributed videobridges. It will be interesting that offocus can be run in a cluster and ofmeet can be distributed in a cluster.

deleolajide commented 6 years ago

We dont have to lose the use of the DB for ofmeet clustering. We can qualify the cluster properties with the machine hostname instead of using local configuration files. I have done this elsewhere.

Last time I checked, the Jitsi team were working on distributed JVBs and using the focus entity to distribute and allocate JVBs to connecting clients from the realtime monitoring data published by the JVBs. That will fit nicely with Openfire clustering when ready.

A while back I looked at an alternate architecture that moved ofmeet to the clients (using a Spark Plugin) and kept offocus on Openfire. Distribution was simpler. The meeting owner's JVB is always chosen.

vincentwlau commented 6 years ago

Using the machine hostname to qualify the cluster properties per node is a good approach. This allows devOp folks to add all required configurations to DB before starting a new instance in the cloud.

I am exploring two architectures: the ofmeet plugin to Openfire vs JVB as standalone. I like the plugin approach for management purpose and possible fail-over capability. However, it ties each JVB with an instance of Openfire and there is a tighter dependency between Jitsi and plugins release cycle (it may get worst when a recording component is added; it is another topic) For few customers (video centric), we need a lot JVB instances (to support ~2500 concurrent a/v conferences with recording) talking to a small clustered Openfire, but in other cases (text chat centric), we need a bigger cluster but fewer JVB instances. Should we just bite the bullet to have a big cluster to support maximum concurrent a/v conf (assuming ofmeet and offocus are clustered ready)? Any thought of that?

vincentwlau commented 6 years ago

BTW, do you have timeline when offocus and ofmeet will be cluster-ready? If it is not in the near future, I would like to work on it with the community.

guusdk commented 6 years ago

There's no timeline. Any help that you can provide would be greatly appreciated.

deleolajide commented 6 years ago

Should we just bite the bullet to have a big cluster to support maximum concurrent

That is what i would do. Each node will have a JVB component with a unique xmpp address and managed by a global single focus component.

With recording, I would opt for client side-recording and upload to server when conference ends

vincentwlau commented 6 years ago

@deleolajide Just FYI. I checked the jitsi codes that jicofo has implemented monitoring each JVB instance and load-balancing based on the number of conferences in each instance.

deleolajide commented 6 years ago

jicofo has implemented monitoring each JVB instance and load-balancing based on the number of conferences in each instance.

I think we are ready to have a first attempt at clustered openfire meetings

If it is not in the near future, I would like to work on it with the community.

As @guusdk pointed out, any additional development resource will be appreciated

vincentwlau commented 6 years ago

I need some help to understand the behavior of "multipleAllowed" external components in clustered Openfire. I have two nodes of Openfire cluster and each node has one "focus" external component connected directly; this set up is equivalent to having the "offocus" plugin installed on each node of a clustered Openfire. After both components are started, the Component Sessions in Openfire console shows two entries of "focus" components with different creation date/time (one is a local session, the other one is remote session.) However, when I click the "component session detail" link in each entry, they both show the local Client IP/host. Why would the local session manager have a remote session? And when an IQ packet is addressed to the "focus" component, which instance of the component will the packet be routed to? Or will it be routed to all instances?

And sometimes (not consistently) when I restart an active "focus" component, the restart would fail with a "conflict" error. It seems that the local component session was not removed from the local session manager. Any ideas?

Is there a document about how Openfire cluster works internally? Any information will be appreciated.

deleolajide commented 6 years ago

On projects where I have done clustering with openfire, the rule we follow is to make sure only instance of an component is running in the cluster to avoid name collusions. In practice, this means that only the senior cluster node should register an xmpp component.

Openfie meetings does not yet have that logic in place. It should also make sure that only one instance of the focus user is created.

vincentwlau commented 6 years ago

@deleolajide Thank you for the comment. Actually it is precisely what I would like to enhance the Openfire meetings: multiple instances of offocus can be registered in a cluster but internally only the senior node will serve the IQ requests. Right now I am just using Jicofo (a standalone app from Jitsi) as an experiment and it is partially working except the component sessions in LocalSessionManager. Eventually I will try the same logic in the offocus plugin-in (as experiment) if it works.

I notice that Openfire Component Manager supports multiple instances of a component and Whack library has a flag "multiple_allowed" specifically for Openfire. The only strange behavior is in LocalSessionManager. According to the Component Sessions in Openfire Admin Console, it says

Below is a list of connected external components to this server. You can also modify the external components settings.

Does it mean that LocalSessionManager in each XMPP instance should only contain the locally connected components in clustered Openfire? But here is what I observed. When I started the first instance of Jicofo external component, one OF console showed a local connected component session while the other OF console showed the remote connected component session (I am not sure if it should show the remote session.) Then I started the second instance of Jicofo component, one OF console showed the original local and new remote sessions (but with the same client IP/hostname), but the other OF console removed the original remote session by the new local session. It is a very odd behavior. For client sessions, the LocalSessionManager contains local and remote sessions. But for component sessions, should it contain local only, or local/remote sessions? Due to the nature of connection-oriented component, it seems better that LocalSessionManager should only contain local component sessions.

deleolajide commented 6 years ago

@guusdk, your opinion is appreciated in this conversation

vincentwlau commented 6 years ago

Updated: the detail of Component Sessions in OF console issue is due to the component name clash as you mentioned before. The console JSP assumes that the component names are unique, so it always gets the first matching name from the list. BTW, if the Component Sessions is meant to show local and remote sessions, it will be nice that there is an indicator for local or remote sessions (similar to the client sessions.)

vincentwlau commented 6 years ago

@deleolajide @guusdk I may have found a bug during Openfire startup with clustering that it affects packet routing to external components connected to a remote node. One of the symptoms was reported in https://discourse.igniterealtime.org/t/hazelcast-cluster-on-openfire-4-0-2-issues/61731/2. When Openfire is configured in clustering mode, the hazelcast plugin is loaded after the internal components were started. Before the clustering was initialized, the nodeID for that XMPP server was set to DEFAULT_NODE_ID (which is essentially empty.) If an external component was connected before clustering is started, an entry will be added to the routingTable in InternalComponentManager with DEFAULT_NODE_ID. If the component connection is lost, the ComponentManager cannot remove the component session from the Hazelcast cache because of this DEFAULT_NODE_ID entry in the routing table. Openfire still thinks that the component session is alive. When the component tries to reconnect, it will get a "conflict" error. As a workaround, I make sure that the external components connect to Openfire after clustering is initialized. But it is not good enough because I cannot simply restart an Openfire node (the external component will connect before clustering is started.) I have been thinking of loading hazelcast plugin at the very beginning during the startup, but enabling/disabling clustering dynamically makes it tricky. What do you think?

vincentwlau commented 6 years ago

@guusdk For https://discourse.igniterealtime.org/t/hazelcast-cluster-on-openfire-4-0-2-issues/61731/2, I have a simple fix. I wonder if anyone from the community would like to verify the approach of my fix:

This fix works for non-clustering environment too.

Also, I have a small fix in Openfire that a multiple-instance component can have a resource in the component JID. For example, "focus.mydomain.com/focus-1" and "focus.mydomain.com/focus-2". This also fixes the Component Session display problem (when there are multiple instances of a component) in admin console. In LocalComponentSession#createSession(),

        // Get the requested subdomain
        String subdomain = domain;
        int index = domain.indexOf(serverName);
        if (index > -1) {
            subdomain = domain.substring(0, index -1);
        }
        // Check if the component has resource specified and retain it if available
        int slash = domain.indexOf('/');
        if (allowMultiple && (slash > 0)) {
            domain = subdomain + "." + serverName + domain.substring(slash);
        } else {
            domain = subdomain + "." + serverName;
        }
        JID componentJID = new JID(domain);

Specifying the resource to the component JID is optional, but it requires a small enhancement to Whack library if one wants to use this feature. I will be happy to work with the community if this enhancement is useful.