xmppjs / xmpp.js

XMPP for JavaScript
ISC License
2.19k stars 372 forks source link

TimeoutError causing constant crashes in RN; cannot be caught #849

Closed ZakSingh closed 3 years ago

ZakSingh commented 3 years ago

We're experiencing thousands of crashes in prod due to a TimeoutError with no further error info. We have not been able to replicate this issue locally. Users reporting these problems tend to be on android devices. We're using websockets.

The main issue is we can't seen to identify how to catch these errors. We have an xmpp.on("error") handler, but logs aren't coming through with TimeoutErrors there. Catching our xmpp.send and xmpp.start calls also doesn't help us. We're at a loss as to where this error is occurring or what is causing it. Is there any way to catch this error?

Another thing to note (which may be related) is that we're seeing thousands of StreamError events appearing in xmpp.on("error") stating the following:

element: {name=stream:error, children=[{name=conflict, children=[], attrs={xmlns=urn:ietf:params:xml:ns:xmpp-streams}}, {name=text, children=[Replaced by new connection], attrs={xmlns=urn:ietf:params:xml:ns:xmpp-streams, xml:lang=en}}], attrs={xmlns:stream=http://etherx.jabber.org/streams}}
application: [undefined]
line: 100.0
message: conflict - Replaced by new connection
condition: conflict

This will happen a dozen times for a user while they're using the app for a few minutes. I have no idea why, as we have the stream management module enabled and I can see xmpp.js sending the <enable resume=true stanzas. We only ever call xmpp.start() once, upon the app launch, but these errors keep coming throughout the duration the user uses the app. What could be causing this?

One final issue we're seeing is lots of Websocket: ECONNERRORs. These will say: event: {message=Read error: ssl=0x7119aa5988: I/O error during system call, Software caused connection abort, isTrusted=false}. Again, we're not sure what's causing this or how to catch the error. Any advice?

We're probably doing something wrong, but it's been a hellish week (>10% of our android sessions crash...) and we're in some hot water... Any help would be much appreciated!

sonnyp commented 3 years ago

Sorry to hear about your week.

Did you look into https://github.com/xmppjs/xmpp.js/tree/master/packages/reconnect ? The stream error is definitely related.

I know of multiple products using xmpp.js/RN on production without issues so this looks fairly specific to your setup.

Happy to have a call to discuss how I can help. My email is in my profile.

ZakSingh commented 3 years ago

Hey,

It looks like the issue is with Ejabberd not handling conflicts for us according to the XMPP spec in rare cases (may be our fault, but we couldn't find anything). Simply removing the manual resource assignment in our client initialization fixed the problem for us!

Miraj98 commented 3 years ago

Hey,

It looks like the issue is with Ejabberd not handling conflicts for us according to the XMPP spec in rare cases (may be our fault, but we couldn't find anything). Simply removing the manual resource assignment in our client initialization fixed the problem for us!

Does simply removing the resource option during client initialisation remove the error? Doesn't that mean the client might be getting re-initialised time and again but since this time you don't pass the resource value manually and the library produces one randomly for you, it starts a new stream with a different jid there by avoiding the error. This feels slightly hacky. Would love to know if you have found a better way to handle this.