mautrix / signal

A Matrix-Signal puppeting bridge
GNU Affero General Public License v3.0
496 stars 75 forks source link

AuthorizationFailedError crashloop after server updates #233

Closed verymilan closed 2 years ago

verymilan commented 2 years ago

Hi there, yesterday a Debian Bullseye was rebooted after updates. Since then, our instance of mautrix-signal is no longer working properly:

Messages only coming from Matrix get sent, and the bridge keeps crashing (aslong as not entirely shutdown by systemd). It seems to be at one specific user althrough i am not shure if this is due to their starting letter of the username.

Signal traceback:

[2022-03-21 21:31:03,126] [INFO@mausignald] Connected to signald v0.17.0-52-d91bb681
[2022-03-21 21:31:03,147] [DEBUG@mau.bridge.e2ee.client] Starting syncing
[2022-03-21 21:31:03,173] [WARNING@mausignald] No handlers for RPC request ListenerState
[2022-03-21 21:31:03,179] [WARNING@mau.user.@user:server.tld] New websocket state from signald: CONNECTING. Error: None
[2022-03-21 21:31:03,185] [WARNING@mau.user.@user:server.tld] New websocket state from signald: CONNECTING. Error: None
[2022-03-21 21:31:03,310] [CRITICAL@mau.init] Unexpected error in main event loop
Traceback (most recent call last):
  File "signal/lib/python3.9/site-packages/mautrix/util/program.py", line 219, in _run
    self.loop.run_until_complete(self.start())
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "signal/lib/python3.9/site-packages/mautrix_signal/__main__.py", line 74, in start
    await super().start()
  File "signal/lib/python3.9/site-packages/mautrix/bridge/bridge.py", line 196, in start
    await super().start()
  File "signal/lib/python3.9/site-packages/mautrix/util/program.py", line 252, in start
    await asyncio.gather(*(self.startup_actions or []))
  File "signal/lib/python3.9/site-packages/mautrix_signal/signal.py", line 231, in start
    if await self.subscribe(user.username):
  File "signal/lib/python3.9/site-packages/mausignald/signald.py", line 98, in subscribe
    await self.request_v1("subscribe", account=username)
  File "signal/lib/python3.9/site-packages/mausignald/rpc.py", line 231, in request_v1
    return await self._request(command, expected_response=command, version="v1", **data)
  File "signal/lib/python3.9/site-packages/mausignald/rpc.py", line 225, in _request
    resp_type, resp_data = await self._raw_request(command, **data)
  File "signal/lib/python3.9/site-packages/mausignald/rpc.py", line 222, in _raw_request
    return await asyncio.shield(future)
mausignald.errors.ResponseError: AuthorizationFailedError: org.whispersystems.signalservice.api.push.exceptions.AuthorizationFailedException: [401] Authorization failed!

signald also throws errors:

Mar 21 22:48:05 matrix.domain.tld signald[12345]: 14:48:05.551 [1a31a360-6873-4663-91f2-722cfe7ccb93-get_group] INFO  get_group - handled request 1a31a360-6873-4663-91f2-722cfe7ccb93 successfully
Mar 21 22:48:05 matrix.domain.tld signald[12345]: 14:48:05.714 [connection-pid-130232] ERROR io.finn.signald.clientprotocol.ClientConnection - error while handling client connection
Mar 21 22:48:05 matrix.domain.tld signald[12345]: java.net.SocketException: ; errno=104
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at org.newsclub.net.unix.NativeUnixSocket.read(Native Method) ~[junixsocket-common-2.4.0.jar:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at org.newsclub.net.unix.AFUNIXSocketImpl$AFUNIXInputStream.read(AFUNIXSocketImpl.java:400) ~[junixsocket-common-2.4.0.jar:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.io.InputStreamReader.read(InputStreamReader.java:181) ~[?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.io.BufferedReader.fill(BufferedReader.java:161) ~[?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.io.BufferedReader.readLine(BufferedReader.java:326) ~[?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.io.BufferedReader.readLine(BufferedReader.java:392) ~[?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at io.finn.signald.clientprotocol.ClientConnection.run(ClientConnection.java:63) [signald.jar:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.lang.Thread.run(Thread.java:829) [?:?]
Mar 21 22:48:05 matrix.domain.tld signald[12345]: 14:48:05.715 [connection-pid-130232] INFO  io.finn.signald.MessageReceiver - Last client for **********************************33 unsubscribed, shutting down message pipe
Mar 21 22:48:05 matrix.domain.tld signald[12345]: Exception in thread "connection-pid-130232" java.util.ConcurrentModificationException
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1511)
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1534)
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at io.finn.signald.MessageReceiver.unsubscribeAll(MessageReceiver.java:78)
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at io.finn.signald.clientprotocol.ClientConnection.run(ClientConnection.java:75)
Mar 21 22:48:05 matrix.domain.tld signald[12345]:         at java.base/java.lang.Thread.run(Thread.java:829)
Mar 21 22:48:06 matrix.domain.tld signald[12345]: 14:48:06.802 [4f4eb2b5-65c1-43b1-a3f6-ea4cbe6ede89-subscribe] INFO  subscribe - handled request 4f4eb2b5-65c1-43b1-a3f6-ea4cbe6ede89 successfully

...
handled request xyz successfully
handled request xyz successfully
handled request xyz successfully
handled request xyz successfully
handled request xyz successfully
...

Mar 21 22:50:15 matrix.domain.tld signald[12345]: 14:50:15.773 [44e3baff-1668-4793-aa3e-4d2c7b0ecb24-get_profile] INFO  get_profile - handled request 44e3baff-1668-4793-aa3e-4d2c7b0ecb24 successfully
Mar 21 22:50:15 matrix.domain.tld signald[12345]: 14:50:15.962 [connection-pid-134026] INFO  io.finn.signald.MessageReceiver - Last client for **********************************33 unsubscribed, shutting down message pipe
Mar 21 22:50:15 matrix.domain.tld signald[12345]: Exception in thread "connection-pid-134026" java.util.ConcurrentModificationException
Mar 21 22:50:15 matrix.domain.tld signald[12345]:         at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1511)
Mar 21 22:50:15 matrix.domain.tld signald[12345]:         at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1534)
Mar 21 22:50:15 matrix.domain.tld signald[12345]:         at io.finn.signald.MessageReceiver.unsubscribeAll(MessageReceiver.java:78)
Mar 21 22:50:15 matrix.domain.tld signald[12345]:         at io.finn.signald.clientprotocol.ClientConnection.run(ClientConnection.java:75)
Mar 21 22:50:15 matrix.domain.tld signald[12345]:         at java.base/java.lang.Thread.run(Thread.java:829)
Mar 21 22:50:16 matrix.domain.tld signald[12345]: 14:50:16.121 [Thread-20] WARN  io.finn.signald.SignalWebSocketHealthMonitor - Missed keep alives, identified last: 1647898316214 unidentified last: 1647898316215 needed by

...
handled request xyz successfully
handled request xyz successfully
handled request xyz successfully
handled request xyz successfully
handled request xyz successfully
...

I suppose the cause is simply the crash itself, similar to what Postgres says: LOG: could not receive data from client: Connection reset by peer

Looking at the Postgres-log, i have noticed that Synapse tried to add the signalbot user on each startup of the bridge, causing a duplicate key error, but i suppose that's normal?

However, right now i am not sure how to proceed with my issue and it's confusing users, because they wouldn't receive errors but also no answers from Signal users. :(

verymilan commented 2 years ago

Update: I now have tried to logout the affected user. I also noticed that at the same time, the crashloop had stopped, but also any logoutput (warning level) had stopped. The bridge responded to clean-rooms which previously did not work due to the frequent restarts. Signald also was quite quiet. Sadly the symptom was not fixed tho, signal->matrix still was broken.

After a restart of the bridge into debug mode, sending from Matrix triggered (successful) log being written, the other way did not make it write any log. I then restarted SignalD again just for fun. Now a new user is broken, and the error is different: mausignald.errors.ResponseError: NoSuchAccountError: account not found Note that this is not the same user as the one i had logged out due to the original error.

verymilan commented 2 years ago

Update: Somehow i didnt receive or accidently deleted the Github notification for the latest release. I have updated and this crashloop vanished as well. Fixing one-way delivery also required another signald restart. So this might have been related to #215 somehow. By the way, on logout, i've got a sqlite3 error, but i am using postgres and it still logged me out successfully.. .