matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.79k stars 2.13k forks source link

Failed handle request via 'RoomHierarchyRestServlet', KeyError: ('m.room.create', '') #12082

Open pacien opened 2 years ago

pacien commented 2 years ago

Description

Listing all the rooms belonging to some private Space on the "Manage & explore rooms" page fails with an "Internal Server Error" from Element web and Android.

Filtering the rooms with the panel on the left outside of this page works fine otherwise.

Steps to reproduce

This error only seems to happen in one particular private space, but not other ones.

The following error appears in loop in Synapse's logs:

synapse.http.server: [GET-1572871] Failed handle request via 'RoomHierarchyRestServlet': <XForwardedForRequest at [REDACTED] method='GET' uri='/_matrix/client/unstable/org.matrix.msc2946/rooms/![REDACTED]/hierarchy?suggested_only=false&from=[REDACTED]&limit=20' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
  File "/nix/store/dv9mv0fmmhkbxczqflhcq6ifvkcvyr72-python3.9-Twisted-21.7.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/nix/store/rp1cyyj9sbwq8xf41d334wxhlh5i2rvl-matrix-synapse-1.52.0/lib/python3.9/site-packages/synapse/util/caches/response_cache.py", line 246, in cb
    return await callback(*args, **kwargs)
  File "/nix/store/rp1cyyj9sbwq8xf41d334wxhlh5i2rvl-matrix-synapse-1.52.0/lib/python3.9/site-packages/synapse/handlers/room_summary.py", line 396, in _get_room_hierarchy
    room_entry = await self._summarize_local_room(
  File "/nix/store/rp1cyyj9sbwq8xf41d334wxhlh5i2rvl-matrix-synapse-1.52.0/lib/python3.9/site-packages/synapse/handlers/room_summary.py", line 654, in _summarize_local_room
    room_entry = await self._build_room_entry(room_id, for_federation=bool(origin))
  File "/nix/store/rp1cyyj9sbwq8xf41d334wxhlh5i2rvl-matrix-synapse-1.52.0/lib/python3.9/site-packages/synapse/handlers/room_summary.py", line 996, in _build_room_entry
    current_state_ids[(EventTypes.Create, "")]
KeyError: ('m.room.create', '')

I previously had issues with creating some room in that space due to another server in the federation being offline. I speculate that some room might have been partially created and partially added to the space.

The end of that traceback seems similar to the one of #10032.

Version information

If not matrix.org:

erikjohnston commented 2 years ago

This is odd, it looks like the server thinks its in the room but the create event isn't in the current state, which really shouldn't happen. Both bits of information are fetched from the current_state_events table, so its not even that data has gotten out of sync.

Can you turn on debug logging and try again please? The SQL queries should tell you which room is causing the problem, at which point you can you run the following SQL and post the results? SELECT * FROM current_state_events WHERE room_id = '<room_id>'.

clokep commented 2 years ago

We've seen this a few times on matrix.org too: https://sentry.matrix.org/sentry/synapse-matrixorg/issues/241899/

pacien commented 2 years ago

@erikjohnston:

Turning on debug logging allowed me to identify the problematic room.

Here's the result of the SQL statement:

matrix-synapse=> select type, membership from current_state_events
  where room_id = '![REDACTED]';

     type      | membership
---------------+------------
 m.room.member | join
 m.room.member | join
(2 rows)

The "Developer Tools" in Element web find the m.room.create event just fine, but it seems to be missing in the current_state_events table as shown above.

This is actually an old room (version 5, encrypted, created two years ago) which I have been using with no trouble at all so far.

It seems to be the only room lacking an m.room.create entry in that table:

matrix-synapse=> select count(distinct room_id) from current_state_events
  where room_id not in (
    select room_id from current_state_events where type = 'm.room.create'
  );

 count
-------
     1

I am not sure how it ended up missing in that table. Is there a command to regenerate the missing entries?

pacien commented 2 years ago

Might be related:

reivilibre commented 2 years ago

Is this a room that you were in and then left?

It may be interesting to know the local memberships of users in that room (it may confirm that everyone local left); do you see any joined local members with this query?:

SELECT user_id, membership FROM local_current_membership WHERE room_id = '!...';

The bugs you suggest are related do in fact seem related

pacien commented 2 years ago

Quoting @reivilibre:

Is this a room that you were in and then left?

I created the room, left it by mistake and joined it again in the past. I am currently in the room.

Here are the relevant events for that room:

select room_memberships.sender, membership
from room_memberships
join events on room_memberships.event_id = events.event_id
where room_memberships.room_id = '![room ID]'
order by topological_ordering;
        sender         | membership
-----------------------+------------
 @[MXID A]             | join
 @[MXID A]             | invite
 @[MXID A]             | leave
 @[MXID B]             | join
 @[MXID B]             | invite
 @[MXID A]             | join

It may be interesting to know the local memberships of users in that room (it may confirm that everyone local left); do you see any joined local members with this query?:

SELECT user_id, membership
FROM local_current_membership
WHERE room_id = '!...';

This query returns:

      user_id       | membership
--------------------+------------
 @[MXID A]          | join
(1 row)

This is correct: I am indeed the only local user present in the room and the other user is on another homeserver.