Closed Ezwen closed 2 years ago
When I try sending a message in said room, I have this message in my synapse logs:
synapse.handlers.message - 945 - WARNING - PUT-3537- Denying new event <FrozenEvent event_id='$161644359715JklbV:myserver.fr', type='m.room.message', state_key=None> b
ecause 403: User @XXX:YYY not in room !FhnTYZhdgIipdYUdar:otherserver.org (None)
I have checked using the synapse admin API, and my user account is definitely in the room.
That is odd, seems like the server is confused about the state of the room. Can you check that you have rows for the room in the event_forward_extremities
table please?
Here is what I have:
$ docker-compose exec db psql --user matrix_synapse -c "SELECT * FROM event_forward_extremities WHERE room_id='!FhnTYZhdgIipdYUdar:otherserver.org'"
event_id | room_id
---------------------------------+-----------------------------------
$16164587839FTJkr:otherserver.org | !FhnTYZhdgIipdYUdar:otherserver.org
(1 row)
(as in all my other messages, I've replaced the real server name by otherserver.org
)
Humm. Odd. If you do an API hit to /_matrix/client/r0/rooms/{roomId}/context/{eventId}
with that room/event do you see your own membership in there?
The only thing I can think of is that somehow the servers view of the "current state" has diverged from the actual state of the room.
I obtain a rather big JSON file, with events and states, where should I look for my membership?
I can only find my user id (@XXX:YYY
) in /state/0/content/users
, where it seems all users with a certain power level are listed − this would be consistent with the fact that @XXX:YYY
is an Administrator in this room.
You should see an event with type: m.room.member
and state_key: @XXX:YY
with a content.membership: join
, if not then that would explain why the server thinks you're not in the room.
Does _matrix/client/r0/rooms/{roomId}/state
return a membership event for you?
You should see an event with
type: m.room.member
andstate_key: @XXX:YY
with acontent.membership: join
, if not then that would explain why the server thinks you're not in the room.
I see! Well indeed I cannot find any event matching the description you make here, so that would explain.
Does
_matrix/client/r0/rooms/{roomId}/state
return a membership event for you?
It gives me this result:
{"errcode":"M_FORBIDDEN","error":"User @XXX:YYY not in room !FhnTYZhdgIipdYUdar:otherserver.org, and room previews are disabled"}
The plot thickens. Can you try doing:
SELECT * FROM current_state_events WHERE room_id = ?
and see if there are a) any rows and b) if the user ID appears and if so what the membership
is
There are lots of rows, including all join
events for all members of the room, and including a row with my own membership:
$1572562716128RmfHF:myserver.fr | !FhnTYZhdgIipdYUdar:otherserver.org | m.room.member | @XXX:myserver.fr | join
(note: since the beginning I wrote both YYY
and myserver.fr
but both are in fact the same.)
Ok, so seems like:
I'm slightly perplexed how 1 and 2 can both be true. It's possible to get out of sync with other servers due to state resets, as I note that its a v1 room.
I'm going to have to have a think about how to unwedge this.
@erikjohnston Thank you very much for taking all this time to investigate :). If there is anything more I can do to help, please do not hesitate to ask.
I suppose there is no easy workaround to manually fix this in the database?
I suppose there is no easy workaround to manually fix this in the database?
Not off the top of my head, editing the current_state_events
table would probably also require manually adding stuff to the current_state_deltas
table. You could try having someone kick you from the room, which might work?
I suppose there is no easy workaround to manually fix this in the database?
Not off the top of my head, editing the
current_state_events
table would probably also require manually adding stuff to thecurrent_state_deltas
table. You could try having someone kick you from the room, which might work?
I've thought of doing that, but unfortunately I am an admin in this room, therefore un-kickable (at least AFAIK?) :(
Out of curiosity, is there any plan to someday add a synapse admin command to "fix inconsistencies in the database", such as this one? It seems to me it could prove useful in many situations.
Quick update: I just discovered that I have this very same problem (with the same same error messages) in another matrix room. Since I am not an admin in this second room, I have asked the admin to kick me and to reinvite me, and this did solve the problem for this room as you expected!
Now in my case the problem only remain in this room where I am admin and therefore un-kickable.
Second quick update: I found a third matrix room with a probably (?) related problem.
It is a 1-1 room that I hadn't used in a while, and I tried sending a message and I got this error in Element desktop: No create event in auth events
. The other person, who is registered on the same homeserver as I, has the same error when sending a message in this room.
On synapse's side, I have errors that look like this:
synapse.handlers.message - 945 - WARNING - PUT-177186- Denying new event <FrozenEvent event_id='$1616850716377rkoye:myserver.fr', type='m.room.message', state_key=None
> because 403: No create event in auth events
Thanks for the updates! Very odd that this has happened to multiple rooms now, though they're both v1 rooms so its possible that they're somehow being state reset (though it seems unlikely for a 1:1 room).
Out of curiosity, is there any plan to someday add a synapse admin command to "fix inconsistencies in the database", such as this one? It seems to me it could prove useful in many situations.
No, not really. In the general case it'd be hard/impossible to do, and take a very long time, but it would be plausible in this case.
I've thought of doing that, but unfortunately I am an admin in this room, therefore un-kickable (at least AFAIK?)
Has anyone else joined/left or otherwise changed state? I think that any state change might be enough to fix it, though possibly not.
Has anyone else joined/left or otherwise changed state? I think that any state change might be enough to fix it, though possibly not.
I've tried leaving/joining the room with a completely separate account, but unfortunately that did not change the situation for the first account.
What other kind of state change should I test?
Quick update about strange things that happened recently:
Additional information:
synapse_1 | 2021-06-17 09:49:01,637 - synapse.handlers.message - 972 - WARNING - PUT-164- Denying new event <FrozenEvent event_id='$16239161411cRWAz:server1.org', type='m.room.message', state_key=None> because 403: User @xxx:server1.org not in room !JiNJVSfIzCRjsKrudF:server2.org (None)
Sorry for the delay in getting back with an update. This is an odd situation which is possibly due to state resets in room version 1(?). Unfortunately, we haven't had the time to do a deeper investigation into this. While the issue is concerning, we're not seeing reports from anyone else and so it hasn't made its way to the top of our priority list.
Next steps here would be to spend time looking at the state changes that are happening and generally what the room DAG is doing (on both servers) to try and see what is going on.
we're not seeing reports from anyone else
I do have the same issue :wave: cannot send messages, cannot leave. Room version is 6, and it was probably caused by some desync in the Gitter bridge (the broken room is a Gitter DM).
Sorry for the delay in getting back with an update. This is an odd situation which is possibly due to state resets in room version 1(?). Unfortunately, we haven't had the time to do a deeper investigation into this. While the issue is concerning, we're not seeing reports from anyone else and so it hasn't made its way to the top of our priority list.
No worries, I perfectly understand, I just wanted to keep you updated :)
Next steps here would be to spend time looking at the state changes that are happening and generally what the room DAG is doing (on both servers) to try and see what is going on.
Unfortunately I am personally unable to do that, but I'll remain subscribed to the issue thread in case there are updates.
I've seen the issue in a handful of rooms I administer that are on version 6, which have Half-Shot's discord bridge in them, as well as one room has the matrix-org IRC bridge.
My personal homeserver does not see all the users while the community one does see all users. Interestingly, after I try inviting users I cannot see from my personal homeserver, I will see their typing notifications but not their actual messages. I get similar entries in my logs as this comment both before and after attempting to invite them.
To try to resolve the issue, the two rooms I was having the most noticeable problems on, I tried leaving and rejoining and this did not fix the problem of not seeing those users. Looks like maybe I need to use the Delete Room Admin API to fully clear out my local view that is busted.
Additionally, some users on other homeservers have become unable to interact with the room, until I kick them and they rejoin. One user on another homeserver was completely unable to accept any invitation to one of these rooms, and they ended up creating an account on our community homeserver instead. After looking in the database, that homeserver seemed to have properly received the invitation event.
erikjohnston mentioned looking at state changes, what exactly should I be looking for?
Thanks for reporting. Unfortunately diagnosing this is going to require detailed familiarity with the state storage schema. Given it's only affecting a small number of servers, I'm afraid it's unlikely we'll be able to find time to investigate in the near future.
I think I'm experiencing the same thing right now. :(
The Arch Linux channel got spammed by hundreds of bots a few days ago and since then I cannot leave the room nor send any messages. The behaviour is exactly as described in OP's first post. I can provide more details if needed.
Is there really no workaround I can try other than asking the Arch Linux staff to kick me? O_o
Thanks in advance for any update on this! (:
Edit: Actually, I cannot see any new messages posted by others.
I was having similar behavior. I'm not positive exactly what solved it, but I'll try to document what I did.
The issue first appeared in a room ("clipboard") where I was the only user. I wasn't able to send, receive, leave, etc. Then for some reason it "spread" to the primary room my wife and I use. Neither of us could do anything in the room.
Same error messages/behavior mentioned above:
I tried to use the "dev tools" in element to force a room update. No luck. I tried to use an appservice to force a state change. No luck. I used the admin API to try to force a state change. No luck.
I gave up and was restoring an old dump of the database and noticed an error when importing the dump.
pg_restore: error: could not execute query: ERROR: could not create unique index "room_account_data_uniqueness"
DETAIL: Key (user_id, room_id, account_data_type)=(<userID>, <clipboard_roomID>, m.fully_read) is duplicated.
Command was: ALTER TABLE ONLY public.room_account_data
ADD CONSTRAINT room_account_data_uniqueness UNIQUE (user_id, room_id, account_data_type);
So I went back to the "broken" database and deleted the extra row. Everything seems to work now, both the "main" room and my "clipboard" room.
The errors only started when I added a database backup script. I think that might be the root cause for me. I lost access to the "main" room around the time the backup script should run. Maybe there's a race condition somewhere.
While dealing with another issue https://github.com/matrix-org/synapse/issues/12549, I ended up having to use the synapse admin API to delete the problematic rooms with force_purge : true
, like this:
curl -H "Authorization: Bearer <access token>" 'https://<homeserver>/_synapse/admin/v2/rooms/<room_id>' -X DELETE -d '{"force_purge": true}'
If the room was shared over federation, you can try joining it again, if it is allowed in the room's parameters.
I think we can close this now. It turned out that @Ezwen had been suffering corruption in the underlying database (with errors from postgres such as missing chunk number 0 for toast value 143975 in pg_toast_16631
, which will explain the erratic behaviour reported here.
@richvdh Could you advise on what steps should be taken to fix the db corruption in such case? I'm having an identical issue right now.
Same here, any help would be appreciated.
Same here, any help would be appreciated.
I think I solved the issue myself by removing my user from the room in the database, then purging it
Description
Since an upgrade of Synapse to 1.30.0 (and a server restart), I encounter significant problems in a specific room (which before worked without any problem). Here are the symptoms:
User @XXX:YYY not in room FhnTYZhdgIipdYUdar:otherserver.org (None)
, where@XXX:YYY
is my userid.Not a known room
Steps to reproduce
What I expect: the message is sent successfully What I get: the error message shown above
Version information
If not matrix.org:
Version: 1.30.0
Install method: Docker
Platform: Debian 9 (stretch), bare metal