processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6.11k stars 1.51k forks source link

Silent occupants disconnection from MUC #3410

Open Nekun opened 4 years ago

Nekun commented 4 years ago

Environment

Bug description

Sometimes occurs silent "freezes" of MUC rooms on server: room displays as live in client UI, but messages and statuses not received; attempts to send a message or any other stanza in room results in <not-acceptable/> error with description "Only occupants are allowed to send messages to the conference" until reconnect to room. XEP-0410 (https://xmpp.org/extensions/xep-0410.html) describes very similar problem and mentioned that this can be caused by interruption of S2S connection. But it seems not any termination cause occupant exclusion from MUC, because very often S2S links closed by MUC server after 10 minutes of stream inactivity, but chat freezes occurs much rarely. For what exactly reasons ejabberd can silently, without sending 'unavailable' presence, exclude occupant from MUC (link terminations, any timeouts, ...), can it be avoided with setting some options, etc?

licaon-kter commented 4 years ago

Gist of your config?

Nekun commented 4 years ago

https://gist.github.com/Nekun/d858eefbba51a1d8ef5f1bfdde937268 *some private data pruned

prefiks commented 4 years ago

Do you have any errors related to muc in ejabberd logs? Generally there is no timeouts for users in muc, but i think user may get removed when message that is sent to him get returned back with error, and that maybe happen when there are problems with connection?

Nekun commented 4 years ago

@prefiks in last 2 weeks there is no errors with "info" loglevel except various S2S stream errors (expired certificate, connection reset, timeout, not well-formed, stanza is too big, etc.)

prefiks commented 4 years ago

Ok, so room was not killed due to errors, if this happen only to s2s users, then probably it works like this, connection to A get broken, B send message to room which in turn get sent to everyone, but message to A get bounced back to room with error about not being able to deliver it, and that causes room to drop A. Rather not something we want to change, as we will get phantom users if we change that, not really something that we want

Nekun commented 4 years ago

@prefiks

if this happen only to s2s users

Yep, my server serves only chat rooms, no local users.

connection to A get broken, B send message to room which in turn get sent to everyone, but message to A get bounced back to room with error about not being able to deliver it, and that causes room to drop A

What of "connection A" do you mean? Connection between user client and user server? Or connection with user server and MUC server? So, user server can't deliver message from MUC and responds to MUC server with <service-unavailable> instead of silent dropping it (for 'groupchat' messages, as I can see in Sec. 8.5.4 RFC6121, server can do both variants), right?

Rather not something we want to change, as we will get phantom users if we change that, not really something that we want

Hmm, maybe this behavior should be configurable? Appearing of phantom users in occupant list (many IMs hasn't reliable presence statuses by design) can be lesser evil than unreliable chats for many administrators, I think. It can be timeout for exclusion-after-stanza-error, to cover much cases with temporary C2S connection problems, e.g.

Also, it's not confirmed, but seems that ejabberd doesn't broadcast 'unavailable' presences for occupant excluded in this way to other occupants (but for new occupants presence for excluded occupant doesn't send). So, at least ejabberd should process this cases as normal.

Neustradamus commented 5 months ago

@Nekun: What is the current situation?