strophe / libstrophe

A simple, lightweight C library for writing XMPP clients
http://strophe.im/libstrophe
Other
401 stars 163 forks source link

Resending of old messages on reconnect #215

Closed mdosch closed 7 months ago

mdosch commented 2 years ago

I am using the debian package of libstrophe 0.12.2 with profanity master.

When I wake up my laptop from suspend old messages (to MUCs, didn't experience this with 1-1 yet) are often resend on reconnect. Not on every reconnect and sometimes the messages are already days old and there have been several reconnects since the message was originally sent an it is resend.

Somehow I suspect the new stream management.

sjaeckel commented 1 year ago

Is this still an issue or can this be closed?

mdosch commented 1 year ago

I disabled sm for now so I don't know whether this is still an issue with sm enabled.

sjaeckel commented 1 year ago

OK, SM is on per default and it seems you're the only one experiencing this issue as nobody else reported it yet.

I'm closing this now, feel free to re-open if you feel like you want to give it a try and it still happens, but then please provide debug logs.

mdosch commented 1 year ago

I'd like to ask you to reopen this as disabling SM is only a workaround and the behavior should be fixed. For context I add a log of a chat in the prosody MUC regarding this issue:

2022-08-19T20:27:02.847281+02 - Martin: What could cause prosody to ack the same stanza multiple times?

> Aug 19 16:27:32 c2s55a326849e30 debug   Sending[c2s]: <a xmlns='urn:xmpp:sm:3' h='233'>
> Aug 19 16:27:32 c2s55a326849e30 debug   SEND: <a xmlns='urn:xmpp:sm:3' h='233'/>
> Aug 19 20:07:46 c2s55a32552c380 debug   Sending[c2s]: <a xmlns='urn:xmpp:sm:3' h='233'>
> Aug 19 20:07:46 c2s55a32552c380 debug   SEND: <a xmlns='urn:xmpp:sm:3' h='233'/>
> Aug 19 20:07:46 c2s55a32552c380 debug   Sending[c2s]: <a xmlns='urn:xmpp:sm:3' h='233'>
> Aug 19 20:07:46 c2s55a32552c380 debug   SEND: <a xmlns='urn:xmpp:sm:3' h='233'/>
2022-08-19T20:27:20.476656+02 - Zash: I see no stanza there
2022-08-19T20:28:01.119822+02 - Zash: Sending SM-Acks with the same counter value is perfectly normal
2022-08-19T20:28:51.945263+02 - Martin: Is it? I don't know anything about 0198 and assumed it's not normal. :)
2022-08-19T20:29:44.653701+02 - jaeckel: OK, we're currently debugging an issue where Martin regularly gets disconnected from the server where prosody says 'Too many unacked stanzas remaining, session can't be resumed'
2022-08-19T20:30:01.051242+02 - Zash: I don't think Prosody does that by itself, but it may be responding to ack-<r>equests without any actual stanzas having been sent in the meantime
2022-08-19T20:30:21.144127+02 - Zash: That means ... precisely what it says
2022-08-19T20:32:39.045064+02 - jaeckel: That means the client didn't send ACKs to all messages the server had sent?
2022-08-19T20:33:22.333894+02 - Zash: It happens if you get disconnected, mod_smacks keeps the session alive, but more than 500 stanzas are sent to it and queued, at which point Prosody starts discarding them.
2022-08-19T20:35:30.488793+02 - Zash: If on resumption it turned out that the client received enough of them before the disconnect for there to be less than 500 stanzas that need to be sent, they are. Otherwise they're gone and the session can't be resumed and is instead destroyed.
2022-08-19T20:35:55.136284+02 - jaeckel: OK, that means I don't have to do anything about it in the client resp. can't do anything about it, right?
2022-08-19T20:36:06.966568+02 - Zash: Means you have to start over with a new session.
2022-08-19T20:36:36.541598+02 - Zash: Also means prosody doesn't end up gobbling up insane amounts of memory, as it did before this limit was added.
2022-08-19T20:36:55.347415+02 - Zash: Try resuming the session before it gets to that point
2022-08-19T20:37:02.730904+02 - Zash: Or don't get disconnected! ;)
2022-08-19T20:37:33.302101+02 - Martin: It's a laptop that suspends when it's not used. 🙃
2022-08-19T20:37:54.192777+02 - Zash: And you're in a bazillion MUCs?
2022-08-19T20:38:13.296298+02 - Martin: I am.
2022-08-19T20:38:40.221984+02 - Zash: Congratulations, this only happens to us.
2022-08-19T20:39:53.750196+02 - Martin: The annoying part is that sometimes old messages get resend after reconnect. So, if this behavior is caused by this issue I hope there is a solution. 😄
2022-08-19T20:39:54.194017+02 - Zash: My recommendation is to not worry about it. You can retrieve all your messages from archives, so nothing should be lost.
2022-08-19T20:40:43.034512+02 - Zash: When my machine suspends, the memory graph shoots straight up until it reaches that limit and/or the smacks session times out.
2022-08-19T20:41:31.647179+02 - jaeckel: One option would be to disable resuming of sessions in the client!?
2022-08-19T20:43:35.442537+02 - Zash: The client should receive a `<sm:failed h='number'/>` that counts as a final ack, so it can check if anything it was sending got lost or not, then it can restart from the resource binding step
2022-08-19T20:43:52.061036+02 - jaeckel: ah OK
2022-08-19T20:44:13.795350+02 - jaeckel: I guess I didn't implement that then :D
2022-08-19T20:44:47.337973+02 - Zash: https://xmpp.org/extensions/xep-0198.html#example-13
2022-08-19T20:44:59.320554+02 - Zash: something something around there
2022-08-19T20:47:53.824436+02 - Zash: > In both of these failure cases, the server SHOULD allow the client to bind a resource [...]
2022-08-19T20:51:20.105857+02 - Zash: mod_smacks does have some tunables, if you wanna trade memory for longer lived sessions, but I wouldn't recommend it
2022-08-19T20:52:55.209861+02 - Zash: there are some possible fancy things that could be done in the future, like having the session leave MUCs to reduce traffic so it can remains resumable longer
2022-08-19T21:01:12.964322+02 - jaeckel: hmmm, it looks like Martin's prosody doesn't behave ... when we send a `<resume h="...` prosody simply closes the socket
2022-08-19T21:01:27.115623+02 - Zash: who when?
2022-08-19T21:01:55.935247+02 - jaeckel: I guess that's in exactly that case
2022-08-19T21:03:37.979269+02 - jaeckel: His client log states:
\```
2022-08-19T11:27:44.141272+02: conn: DBG: SENT: <resume h="43220" previd="MkPdJ46_2aas" xmlns="urn:xmpp:sm:3"/>
2022-08-19T11:27:44.141366+02: xmpp: DBG: Q_SENT: 0x55e2ee3de2d0
2022-08-19T11:27:45.064849+02: tls: DBG: error=SSL_ERROR_ZERO_RETURN(6) errno=0 lasterror=0
2022-08-19T11:27:45.064938+02: xmpp: DBG: Unrecoverable TLS error, 6.
2022-08-19T11:27:45.064977+02: xmpp: DBG: Closing socket.
\```
at the same time I get
\```
11:27:46 - -- martin@mdosch.de (profanity.rocks) is offline, "Disconnected: Too many unacked stanzas remaining, session can't be resumed"
\```
2022-08-19T21:04:32.726985+02 - jaeckel: His client log states:
\```
2022-08-19T11:27:44.141272+02: conn: DBG: SENT: <resume h="43220" previd="MkPdJ46_2aas" xmlns="urn:xmpp:sm:3"/>
2022-08-19T11:27:44.141366+02: xmpp: DBG: Q_SENT: 0x55e2ee3de2d0
2022-08-19T11:27:45.064849+02: tls: DBG: error=SSL_ERROR_ZERO_RETURN(6) errno=0 lasterror=0
2022-08-19T11:27:45.064938+02: xmpp: DBG: Unrecoverable TLS error, 6.
2022-08-19T11:27:45.064977+02: xmpp: DBG: Closing socket.
\```
at the same time I get
\```
11:27:46 - -- martin[...]  is offline, "Disconnected: Too many unacked stanzas remaining, session can't be resumed"
\```
2022-08-19T21:04:50.805138+02 - Martin: My JID is public, don't worry.
2022-08-19T21:04:56.585678+02 - Zash: Ah, uh, yeah, because it does a lot of complicated things to move stuff from the old session to the new session before it realises that it can't complete the resume, and by then it'd be complicated to revert all that
2022-08-19T21:07:17.300684+02 - jaeckel: OK, so you're not always doing what the XEP says, just if it's not too much hastle :)
2022-08-19T21:07:52.413041+02 - Martin: 😱
2022-08-19T21:09:47.633722+02 - Zash: This part of the code is a mess that is still in dire need of refactoring and/or burning with fire.
2022-08-19T21:10:25.332338+02 - Martin: We already had enough fires this summer.
2022-08-19T21:10:34.528391+02 - Zash: Hm, but shouldn't it close with a stream error?
2022-08-19T21:10:43.148486+02 - Martin: Maybe wait for Gävlebocken.
2022-08-19T21:11:48.156681+02 - jaeckel: Zash: doesn't look so ... or it tries to send a stream error but closing the socket is faster ...
2022-08-19T21:12:02.697505+02 - Zash: ```lua
        if err or not queue:resumable() then
            original_session.send(st.stanza("failed",
                { xmlns = xmlns_sm; h = format_h(original_session.handled_stanza_count); previd = id }));
            original_session:close(err);
            return false;
        end
\```
2022-08-19T21:14:28.638643+02 - jaeckel: it also looks like it's really closing the socket, not even trying to teardown the TLS connection first
2022-08-19T21:14:28.792143+02 - Zash: Oh
2022-08-19T21:14:28.999875+02 - Zash: That might be unintentional
2022-08-19T21:14:29.102804+02 - jaeckel: or wait ... let me check first what `SSL_ERROR_ZERO_RETURN` really means
2022-08-19T21:14:29.260429+02 - jaeckel: ah no, it's fine
2022-08-19T21:14:29.357753+02 - jaeckel: sorry
2022-08-19T21:14:29.597268+02 - jaeckel: https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html
2022-08-19T21:14:29.735177+02 - Zash: hm, no, the thing I was thinking of isn't something that happens when just closing a session
2022-08-19T21:15:02.923609+02 - Zash: it does a thing on shutdown to make it so you can resume after the restart, which fails, but you get the last counter value back
2022-08-19T21:15:23.132085+02 - Zash: maybe in the future we'll have resumptions across restarts
sjaeckel commented 8 months ago

Can you please try out the latest master version?

mdosch commented 8 months ago

Unfortunately I will have little time until 16.02. so I can't promise I can do this earlier.

Am 31. Januar 2024 16:28:55 UTC schrieb Steffen Jaeckel @.***>:

Can you please try out the latest master version?

mdosch commented 7 months ago

I enabled sm in profanity and had many reconnects on two machines over the last week and didn't experience this problem anymore, therefore I'm closing the issue.