processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6.09k stars 1.51k forks source link

Lost messages for no apparent reason (regression) #3791

Closed luke-jr closed 2 years ago

luke-jr commented 2 years ago

Environment

Errors from error.log/crash.log

No errors

Bug description

After upgrading from 20.04 to 21.04, randomly messages get lost. When it happens, it continues to lose messages until at least the client reconnects (but not right away upon reconnection!). There is no indication to the sender, that the recipient didn't get the messages.

licaon-kter commented 2 years ago

Config? Config differences old vs new?

luke-jr commented 2 years ago

20.04 config:

```yml ### ### ejabberd configuration file ### ### The parameters used in this configuration file are explained at ### ### https://docs.ejabberd.im/admin/configuration ### ### The configuration file is written in YAML. ### ******************************************************* ### ******* !!! WARNING !!! ******* ### ******* YAML IS INDENTATION SENSITIVE ******* ### ******* MAKE SURE YOU INDENT SECTIONS CORRECTLY ******* ### ******************************************************* ### Refer to http://en.wikipedia.org/wiki/YAML for the brief description. ### hosts: # - "localhost" - "dashjr.org" - "anonymous.dashjr.org" - "friends.dashjr.org" loglevel: info log_rotate_size: 10485760 log_rotate_count: 5 ## If you already have certificates, list them here certfiles: - /etc/ssl/ejabberd/server.pem listen: - port: 5222 ip: "::" module: ejabberd_c2s max_stanza_size: 262144 shaper: c2s_shaper access: c2s starttls_required: true - port: 5269 ip: "::" module: ejabberd_s2s_in max_stanza_size: 524288 - port: 5443 ip: "::" module: ejabberd_http tls: true request_handlers: /admin: ejabberd_web_admin /api: mod_http_api /bosh: mod_bosh /captcha: ejabberd_captcha /upload: mod_http_upload /ws: ejabberd_http_ws - port: 5280 ip: "::" module: ejabberd_http request_handlers: /admin: ejabberd_web_admin /.well-known/acme-challenge: ejabberd_acme - port: 3478 transport: udp module: ejabberd_stun use_turn: true ## The server's public IPv4 address: # turn_ip: 203.0.113.3 - port: 1883 ip: "::" module: mod_mqtt backlog: 1000 s2s_use_starttls: optional acl: admin: user: luke@dashjr.org local: user_regexp: "" loopback: ip: - 127.0.0.0/8 - ::1/128 access_rules: local: allow: local c2s: deny: blocked allow: all announce: allow: admin configure: allow: admin muc_create: allow: local pubsub_createnode: allow: local trusted_network: allow: loopback api_permissions: "console commands": from: - ejabberd_ctl who: all what: "*" "admin access": who: access: allow: acl: loopback acl: admin oauth: scope: "ejabberd:admin" access: allow: acl: loopback acl: admin what: - "*" - "!stop" - "!start" "public commands": who: ip: 127.0.0.1/8 what: - status - connected_users_number shaper: normal: 1000 fast: 50000 shaper_rules: max_user_sessions: 10 max_user_offline_messages: 5000: admin 100: all c2s_shaper: none: admin normal: all s2s_shaper: fast modules: mod_adhoc: {} mod_admin_extra: {} mod_announce: access: announce mod_avatar: {} mod_blocking: {} mod_bosh: {} mod_caps: {} mod_carboncopy: {} mod_client_state: {} mod_configure: {} mod_disco: {} mod_fail2ban: {} mod_http_api: {} mod_http_upload: put_url: https://@HOST@:5443/upload mod_last: {} mod_mam: ## Mnesia is limited to 2GB, better to use an SQL backend ## For small servers SQLite is a good fit and is very easy ## to configure. Uncomment this when you have SQL configured: ## db_type: sql assume_mam_usage: true default: always mod_mqtt: {} mod_muc: access: - allow access_admin: - allow: admin access_create: muc_create access_persistent: muc_create access_mam: - allow default_room_options: mam: true mod_muc_admin: {} mod_offline: access_max_user_messages: max_user_offline_messages mod_ping: {} mod_privacy: {} mod_private: {} mod_proxy65: access: local max_connections: 5 mod_pubsub: access_createnode: pubsub_createnode plugins: - flat - pep force_node_config: ## Avoid buggy clients to make their bookmarks public storage:bookmarks: access_model: whitelist mod_push: {} mod_push_keepalive: {} mod_register: ## Only accept registration requests from the "trusted" ## network (see access_rules section above). ## Think twice before enabling registration from any ## address. See the Jabber SPAM Manifesto for details: ## https://github.com/ge0rg/jabber-spam-fighting-manifesto ip_access: trusted_network welcome_message: subject: "Welcome!" body: |- Welcome to the Dashjr IM server. registration_watchers: - "luke@dashjr.org" # don't allow deleting own account access_remove: none mod_roster: versioning: true mod_s2s_dialback: {} mod_shared_roster: {} mod_stream_mgmt: resend_on_timeout: if_offline mod_stun_disco: {} mod_vcard: {} mod_vcard_xupdate: {} mod_version: show_os: false mod_muc_log: file_permissions: mode: 600 outdir: /var/log/jabber/muc timezone: universal auth_method: internal host_config: anonymous.dashjr.org: auth_method: [anonymous] anonymous_protocol: sasl_anon ### Local Variables: ### mode: yaml ### End: ### vim: set filetype=yaml tabstop=8 ``` 21.04 config: ```yml ### ### ejabberd configuration file ### ### The parameters used in this configuration file are explained at ### ### https://docs.ejabberd.im/admin/configuration ### ### The configuration file is written in YAML. ### ******************************************************* ### ******* !!! WARNING !!! ******* ### ******* YAML IS INDENTATION SENSITIVE ******* ### ******* MAKE SURE YOU INDENT SECTIONS CORRECTLY ******* ### ******************************************************* ### Refer to http://en.wikipedia.org/wiki/YAML for the brief description. ### hosts: # - "localhost" - "dashjr.org" - "anonymous.dashjr.org" - "friends.dashjr.org" loglevel: info ## If you already have certificates, list them here certfiles: - /etc/ssl/ejabberd/server.pem listen: - port: 5222 ip: "::" module: ejabberd_c2s max_stanza_size: 262144 shaper: c2s_shaper access: c2s starttls_required: true - port: 5223 ip: "::" tls: true module: ejabberd_c2s max_stanza_size: 262144 shaper: c2s_shaper access: c2s starttls_required: true - port: 5269 ip: "::" module: ejabberd_s2s_in max_stanza_size: 524288 - port: 5443 ip: "::" module: ejabberd_http tls: true request_handlers: /admin: ejabberd_web_admin /api: mod_http_api /bosh: mod_bosh /captcha: ejabberd_captcha /upload: mod_http_upload /ws: ejabberd_http_ws - port: 5280 ip: "::" module: ejabberd_http request_handlers: /admin: ejabberd_web_admin /.well-known/acme-challenge: ejabberd_acme - port: 3478 ip: "::" transport: udp module: ejabberd_stun use_turn: true ## The server's public IPv4 address: # turn_ipv4_address: "203.0.113.3" ## The server's public IPv6 address: # turn_ipv6_address: "2001:db8::3" - port: 1883 ip: "::" module: mod_mqtt backlog: 1000 s2s_use_starttls: optional acl: admin: user: luke@dashjr.org local: user_regexp: "" loopback: ip: - 127.0.0.0/8 - ::1/128 access_rules: local: allow: local c2s: deny: blocked allow: all announce: allow: admin configure: allow: admin muc_create: allow: local pubsub_createnode: allow: local trusted_network: allow: loopback api_permissions: "console commands": from: - ejabberd_ctl who: all what: "*" "admin access": who: access: allow: - acl: loopback - acl: admin oauth: scope: "ejabberd:admin" access: allow: - acl: loopback - acl: admin what: - "*" - "!stop" - "!start" "public commands": who: ip: 127.0.0.1/8 what: - status - connected_users_number shaper: normal: rate: 3000 burst_size: 20000 fast: 100000 shaper_rules: max_user_sessions: 10 max_user_offline_messages: 5000: admin 100: all c2s_shaper: none: admin normal: all s2s_shaper: fast modules: mod_adhoc: {} mod_admin_extra: {} mod_announce: access: announce mod_avatar: {} mod_blocking: {} mod_bosh: {} mod_caps: {} mod_carboncopy: {} mod_client_state: {} mod_configure: {} mod_disco: {} mod_fail2ban: {} mod_http_api: {} mod_http_upload: put_url: https://@HOST@:5443/upload custom_headers: "Access-Control-Allow-Origin": "https://@HOST@" "Access-Control-Allow-Methods": "GET,HEAD,PUT,OPTIONS" "Access-Control-Allow-Headers": "Content-Type" mod_last: {} mod_mam: ## Mnesia is limited to 2GB, better to use an SQL backend ## For small servers SQLite is a good fit and is very easy ## to configure. Uncomment this when you have SQL configured: ## db_type: sql assume_mam_usage: true default: always mod_mqtt: {} mod_muc: access: - allow access_admin: - allow: admin access_create: muc_create access_persistent: muc_create access_mam: - allow default_room_options: mam: true mod_muc_admin: {} mod_offline: access_max_user_messages: max_user_offline_messages mod_ping: {} mod_privacy: {} mod_private: {} mod_proxy65: access: local max_connections: 5 mod_pubsub: access_createnode: pubsub_createnode plugins: - flat - pep force_node_config: ## Avoid buggy clients to make their bookmarks public storage:bookmarks: access_model: whitelist mod_push: {} mod_push_keepalive: {} mod_register: ## Only accept registration requests from the "trusted" ## network (see access_rules section above). ## Think twice before enabling registration from any ## address. See the Jabber SPAM Manifesto for details: ## https://github.com/ge0rg/jabber-spam-fighting-manifesto ip_access: trusted_network welcome_message: subject: "Welcome!" body: |- Welcome to the Dashjr IM server. registration_watchers: - "luke@dashjr.org" # don't allow deleting own account access_remove: none mod_roster: versioning: true mod_s2s_dialback: {} mod_shared_roster: {} mod_stream_mgmt: resend_on_timeout: if_offline mod_stun_disco: {} mod_vcard: {} mod_vcard_xupdate: {} mod_version: show_os: false mod_muc_log: file_permissions: mode: 600 outdir: /var/log/jabber/muc timezone: universal auth_method: internal host_config: anonymous.dashjr.org: auth_method: [anonymous] anonymous_protocol: sasl_anon ### Local Variables: ### mode: yaml ### End: ### vim: set filetype=yaml tabstop=8 ```
licaon-kter commented 2 years ago

How do you know you're missing messages?

Which clients are used on this account?

luke-jr commented 2 years ago

I physically find my child and he shows me he didn't get any of my recent messages. From Psi+ to Conversations.

licaon-kter commented 2 years ago

Can you try with Dino or Gajim or another Conversations instead of Psi+?

What type of messages? They were online at the time or offline?

luke-jr commented 2 years ago

Can't risk lost messages this weekend, so it'll have to wait :/

Just plain text "chat" type messages. Both users were online (and are 24/7).

Both users also have multiple connections (Psi+ and Conversations). I didn't notice if the recipient's Psi+ got the messages or not.

Neustradamus commented 2 years ago

What do you use like Psi+ version and what OS? With or without OMEMO?

luke-jr commented 2 years ago

Sender: Psi+ 1.5.1484 on Gentoo.

(Recipient's other client: Psi 1.3-5build1 from/on Ubuntu 20.04)

I have the Psi+ OMEMO plugin on the sender, but I don't know if it was enabled or not.

Neustradamus commented 2 years ago

@luke-jr: Can you update Psi+ to last build? Some OMEMO problems have been solved, maybe it is linked.

luke-jr commented 2 years ago

Confirmed that the recipient's Psi+ did receive the messages that prompted this initial report.

Also been having issues with 20.04, with another child having two Psi+s yet only receiving messages at one or the other...

Can ejabberd add an option to simply forward all messages to all resources, disregarding if they're directed to a specific one or what the priorities of each connection are? :/

badlop commented 2 years ago

Can ejabberd add an option to simply forward all messages to all resources, disregarding if they're directed to a specific one or what the priorities of each connection are? :/

Those behaviours go against the RFC...

Anyway, in your specific case, maybe a dirty patch is easier than explaining your users, or tweaking client configuration. This small patch customizes it for your specific case. Notice it's a proof of concept, it may break other parts, for instance, MUC rooms.

diff --git a/src/ejabberd_sm.erl b/src/ejabberd_sm.erl
index 231e4351e..e6086019b 100644
--- a/src/ejabberd_sm.erl
+++ b/src/ejabberd_sm.erl
@@ -699,7 +699,7 @@ do_route(#presence{to = #jid{lresource = <<"">>} = To} = Packet) ->
       fun({_, R}) ->
          do_route(Packet#presence{to = jid:replace_resource(To, R)})
       end, get_user_present_resources(LUser, LServer));
-do_route(#message{to = #jid{lresource = <<"">>} = To, type = T} = Packet) ->
+do_route(#message{to = To, type = T} = Packet) ->
     ?DEBUG("Processing message to bare JID:~n~ts", [xmpp:pp(Packet)]),
     if T == chat; T == headline; T == normal ->
        route_message(Packet);
@@ -762,7 +762,7 @@ route_message(#message{to = To, type = Type} = Packet) ->
     case catch lists:max(PrioRes) of
       {MaxPrio, MaxRes}
      when is_integer(MaxPrio), MaxPrio >= 0 ->
-     lists:foreach(fun ({P, R}) when P == MaxPrio;
+     lists:foreach(fun ({P, R}) when true;
                      (P >= 0) and (Type == headline) ->
                LResource = jid:resourceprep(R),
                Mod = get_sm_backend(LServer),
Neustradamus commented 2 years ago

Can you update it to Psi+ 1.5.1605 or more?

Better to use Psi+ than Psi, the last is 1.5.

luke-jr commented 2 years ago

Latest for Ubuntu focal (which that system runs) is Psi 1.3-5build1 or Psi+ 1.4.554-4 (which it is now running)

Neustradamus commented 2 years ago

@tehnick, @Ri0n: Can you reply here about old Psi and Psi+?

Ri0n commented 2 years ago

https://launchpad.net/~psi-plus/+archive/ubuntu/ppa

It's also interesting where stream management is enabled in account settings in Psi. Unfortunately not all the XEPs related to reliability are implemented in Psi. So lost messages are possible with bad connection.

luke-jr commented 2 years ago

Anyway, in your specific case, maybe a dirty patch is easier than explaining your users, or tweaking client configuration. This small patch customizes it for your specific case. Notice it's a proof of concept, it may break other parts, for instance, MUC rooms.

Got around to trying this. The patch causes Psi+ to see duplicates of everything sent. >_<

luke-jr commented 2 years ago

Duplicates appear to be carbons. Any easy way to not affect internally-generated stuff?

luke-jr commented 2 years ago

Maybe a better solution would be to force all c2s connections to the same priority, and strip /resource specifiers on c2s packets?

weiss commented 2 years ago

Maybe a better solution would be to force all c2s connections to the same priority, and strip /resource specifiers on c2s packets?

The core XMPP RFCs specify how to address individual devices vs. accounts. This is used for various use cases. Breaking such essential rules in order to work around client issues is not a sensible solution, no.

Trying another client is not an option? If only to verify there's no actual server-side issue (in which case we could close this issue)?

luke-jr commented 2 years ago

The core XMPP RFCs specify how to address individual devices vs. accounts. This is used for various use cases. Breaking such essential rules in order to work around client issues is not a sensible solution, no.

That's why I'm suggesting doing it at a border layer, rather than deep in the internals.

Do you have a better solution?

Trying another client is not an option? If only to verify there's no actual server-side issue

If it was easily or at least predictably reproduced, perhaps, but random occurrences don't really make it a viable option.

Besides, there aren't really any better desktop/Qt clients AFAIK?

weiss commented 2 years ago

Do you have a better solution?

I think it's better to fix client issues on the client side.

there aren't really any better desktop/Qt clients AFAIK?

I'd try Gajim or (on Linux) Dino, for example. (Not Qt, but I'd hope the UI toolkit in use isn't relevant here?) If all else fails, Converse.js might be another option (you could use a public instance with a test account for tracking down this issue).

Ri0n commented 2 years ago

Hey guys. I didn't quite follow all the conversation. But are you talking about not always working carbons in Psi? IIRC carbons have some issues in Psi when used together with OMEMO. Probably I won't be able to fix it in Psi because of lack of spare time for the project. But I'll gladly accept patches.

luke-jr commented 2 years ago

I think it's better to fix client issues on the client side.

Since this is a regression when the server was upgraded, there's no reason to think it's a client issue.

weiss commented 2 years ago

there's no reason to think it's a client issue.

In that case it will be easy to reproduce with other clients, right? Could you do that to double-check?

weiss commented 2 years ago

Another thing:

I have the Psi+ OMEMO plugin on the sender, but I don't know if it was enabled or not.

Please always double-check such issues can be reproduced with OMEMO disabled on all involved clients.

luke-jr commented 2 years ago

It's not easy to reproduce even with the same client. I have no idea how to reliably reproduce it. It's an apparently-random occurrence which can happen at the most inopportune moments when I actually need the recipient to get the message immediately.

That being said, a few days ago I did upgrade to 22.05, so over the next few months will discover if it's still an issue or not.

luke-jr commented 2 years ago

Please always double-check such issues can be reproduced with OMEMO disabled on all involved clients.

I don't think it's possible to disable OMEMO in Conversations. cc @iNPUTmice

weiss commented 2 years ago

I'll close this issue for the moment then. If you run into it again and manage to reproduce the problem with OMEMO disabled, feel free to open a new one with any additional info you were able to gather.

licaon-kter commented 2 years ago

@luke-jr settings - omemo - default on, then per chat set it on or off

Neustradamus commented 2 years ago

@luke-jr: Have you tested with two Psi+ clients without OMEMO plugin?

luke-jr commented 2 years ago

Again, it's not easily reproducible. I cannot just stop using IM as we need it for weeks to test.