Open woj-tek opened 3 years ago
Thanks for suggesting!
To add some context - I'm from Tigase team and we run public server tigase.im (sure.im/jabber.today) which is a cluster installation (behind load balancer). Due to lack of support for location
attribute Monal users can't resume sessions correctly most of the time as they usually reconnect on different cluster node which isn't aware of the previous session and the request fails.
Yes, it would be possible, but it's low priority because both, ejabberd and prosody don't support the location property anyways,. I guess openfire does not support/use it as well.
@guusdk may you comment on this?
For Stream Management to work properly in an Openfire cluster, the client must resume the stream on the same cluster node. To facilitate this, the location
attribute was added in version 4.5.0 of Openfire.
I think the problem boils down to "popular community demand" - majority of current public deployments seems to use single-node/single-server setup, in which case it doesn't matter. If there is a clustering involve it simply breaks most of the time as sessions are usually cached on the same node that the connection was established (which simplifies a lot many things, due to that we also use see-other-host
to group same user connections on the same node, but that's OT :-) ).
Currently we are pondering simply disabling announcing StreamManagement to Monal users, though detecting it would be somewhat tricky (considering SM negotiation it would simply be based on resource name, sigh), but still better that constantly failing to resume the session because app knocks on the wrong machine...
In my reasoning, features like these - those that are primarily desirable by organisations that make use a professional (clustered) environment - make for good candidates for bounties, or otherwise commissioned type of work.
I'm unsure what amount of budget would be needed to fix this client-sided, but it is hard to imagine that it would be significantly more than the budget that such an organisation would need to decide if said work is to be commissioned in the first place.
Given that the Monal project signed up to the Github Sponsors program, I suspect that there is an easy way here to get this feature realized.
(Please note that I'm in no way, shape or form associated with the Monal project, and I'm in no way trying to express a point of view of the Monal project team on this. I'm only sharing insight and experience from other OSS-based projects that I'm active in, that work for me).
I would disagree in a way - professional/paid environments tend to use software from single provider.
I think that our tigase.im public installation is somewhat of an outlier here and our desire to provide something with high-availability (hence using cluster) for the benefit of the users is simply unusual and thus warrant less focus, which is also understandable in a way. Now, we already addressed it somewhat on our end (we basically drop such dangling sessions more pre-emptively) but this could simply mean less than optimal ergonomics of Monal users using tigase.im public servers. Which are probably a (tiny?) minority :-)
we already addressed it somewhat on our end
I would think you'd need to cope with clients attempting to resume on other nodes anyway, given the spec says that "if reconnection to that location fails, the standard XMPP connection algorithm specified in RFC 6120 applies." And reconnection might of course always fail due to random network hickups or whatever.
BTW, I've always assumed the use case for location
would be telling clients to resume on node B when A goes down. Hence I've been unhappy to make that decision on <enable/>
and not even being able to update the location
on <resume/>
, which renders the feature pretty much useless for my use case. But yes, it obviously makes sense for telling clients to stick to the same node if you have no way to share the state required for resumption.
As we don't share session state across the nodes thus if the client connects do different location (that doesn't know about the previous session) then the resumption would obviously fail thus the client would simply proceed with regular session establishment. In our case the slight issue was the dangling sessions that weren't resumed.
I would agree that preemptive use of location
would be somewhat inconvenient without the update as cluster could be quite dynamic. @weiss - does ejabberd clustering has shared state and allows resuming on whichever node?
does ejabberd clustering has shared state and allows resuming on whichever node?
The state is copied over from the previous node during resumption. So if a node goes into a planned downtime, one option is to have that node stop accepting new connections, then kick remaining clients and maybe wait for a few minutes to give the clients the chance to resume on another node.
@woj-tek dangling sessions mean you have two sessions bound to the same xmpp resource? the new one opened by the client and the old one still being XEP-0198 hibernated? Isn't that against the RFC mandating all resources have to be unique? (in the namespace of a bare-jid of course)
@woj-tek kind reminder on the discussing here. May you reply to the previous message?
Sorry @Echolon it slipped in the notifications.
@tmolitor-stud-tu - no, we don't have multiple resources bound to the same resource and we enforced that across the cluster. However, having the client reconnect and intent to resume the session on different cluster node still makes it impossible to resume the session (thus, extending the connection time).
Describe your feature
Currently it seems that
location
attribute is not respected by Monal: https://github.com/monal-im/Monal/blob/b357abd60c4154f743d25e8db5819f8f511f0902/Monal/Classes/xmpp.m#L1765 Would it be possible to add handling of thelocation
attribute and during resumption try to use that address first as per specification: