Open MadLittleMods opened 1 year ago
This is due to us creating the room not all in one transaction, e.g. the alias is created here:
But if anything fails after that point it will still exist.
In order to clean-up this up safely I ended up using the purge room API with {"erase": true}
.
@clokep Why does the create room endpoint return 200 ✅ before the room is totally created? (worker hand-off with no checking?)
Why does the create room endpoint return 200 ✅ before the room is totally created? (worker hand-off with no checking?)
I haven't found a code-path that could do this, are we sure we got a 200 response and not a timeout or something?
It looks like the various worker hand-offs are all awaited properly, but maybe an error is being swallowed here:
Or maybe something weird is happening where we don't eject things from the pre-cache if the persisting fails?
I don't see other open issues that /createRoom
isn't idempotent.
I don't see other open issues that
/createRoom
isn't idempotent.
It can take a transaction id and be PUT in Synapse. I think this is unspecced. There's another issue somewhere for this, let me find it
There's another issue somewhere for this, let me find it
I haven't found a code-path that could do this, are we sure we got a 200 response and not a timeout or something?
Good call!
It's possible that some recent code filled this in. We introduced some code to resolve a conflict where the Matrix alias exists but we don't have it stored as a bridged room entry yet: https://gitlab.com/gitterHQ/webapp/-/merge_requests/2357 (2023-01-21)
This looks very likely when I extract the date from the bridged room entries in the database since the dates come after that PR:
const mongoUtils = require('gitter-web-persistence-utils/lib/mongo-utils');
const bridgedRoomEntryIds = ['63e17d0e6da0373984be0760', '63ce5f4e6da0373984bd6166', '63ce57ba6da0373984bd612b', '63ce75b26da0373984bd6257', '63cca9586da0373984bd5910', '63ce771a6da0373984bd625a', '63ce59116da0373984bd6137', '63ccaf3a6da0373984bd591d', '63ce591d6da0373984bd6138', '63ce77586da0373984bd625b', '63ce697b6da0373984bd61c1', '63ce78886da0373984bd6264', '63ce79576da0373984bd6265', '63ce6acb6da0373984bd61c5', '63ce5e1f6da0373984bd6158', '63ccb4266da0373984bd5927', '63ce72496da0373984bd622b', '63ce92246da0373984bd6383', '63ccc1716da0373984bd593c'];
bridgedRoomEntryIds.map((bridgedRoomEntryId) => {
return mongoUtils.getDateFromObjectId(bridgedRoomEntryId);
});
[
2023-02-06T22:19:58.000Z,
2023-01-23T10:19:58.000Z,
2023-01-23T09:47:38.000Z,
2023-01-23T11:55:30.000Z,
2023-01-22T03:11:20.000Z,
2023-01-23T12:01:30.000Z,
2023-01-23T09:53:21.000Z,
2023-01-22T03:36:26.000Z,
2023-01-23T09:53:33.000Z,
2023-01-23T12:02:32.000Z,
2023-01-23T11:03:23.000Z,
2023-01-23T12:07:36.000Z,
2023-01-23T12:11:03.000Z,
2023-01-23T11:08:59.000Z,
2023-01-23T10:14:55.000Z,
2023-01-22T03:57:26.000Z,
2023-01-23T11:40:57.000Z,
2023-01-23T13:56:52.000Z,
2023-01-22T04:54:09.000Z
]
Otherwise, normally I think it would have to be a 200
since we would need to receive a 200
with the room_id
returned. But the above situation looks more likely 🕵️♀️
Description
Trying to tie up any loose-ends with the Gitter import as part of the Gitter migration process and noticing that we have some Matrix rooms recorded that seem to broken on the Synapse side. Some of these rooms only have a
m.room.create
event in it while others have no events which means we can't interact with it at all since permission denied errors everywhere (no membership or power-level events).The only way we recorded this room on the Gitter side is if we got a 200-response from Synapse originally when it was created. If Synapse is returning with a error-free response, there should be much better guarantees on the room being created successfully (atomic transaction). Of course it's possible this could have been corrupted after the fact though.What most likely happened was the rooms were attempted to be created but timed out. Then we introduced code to reconcile the case where a Matrix room alias exists but we don't have the bridged Matrix room entry recorded, see https://github.com/matrix-org/synapse/issues/15005#issuecomment-1421481395They all seem to be created on January 10th, 2023 which is probably in the middle of importing messages and the import scripts.
Here is the list of affected rooms that I know of:
!gHbYoGxiEuXmahoYeh:gitter.im
(onlym.room.create
)!OpjflmXuauidodzmJH:gitter.im
(onlym.room.create
)!QCVddUkjrHLjEqeXQd:gitter.im
(no events)!wJBkwhqwLssHVjKErI:gitter.im
(onlym.room.create
)!kkjacXiigblOffltrw:gitter.im
(onlym.room.create
)!etJKlYVRXEtkkMdEOJ:gitter.im
(no events)!VeVZEnGasXJXpAYECh:gitter.im
(onlym.room.create
)!iJIbqcgSzsAFnwjVBI:gitter.im
(no events)!koFtykttKGRpjbzflq:gitter.im
(no events)!mxfIojpZcyQDBbmmoY:gitter.im
(onlym.room.create
)!ezxePTXYHMSIkpGOiy:gitter.im
(onlym.room.create
)!QlEqqNHuTSIHswllOX:gitter.im
(onlym.room.create
)!XInKHVcDAmmyGBEUwr:gitter.im
(no events)!KIKPFcoDnhBeZhLNUr:gitter.im
(onlym.room.create
)!BcBAOZpxjTAvMDfSHf:gitter.im
(onlym.room.create
)!DuyTBdNdlhfXoBFNqr:gitter.im
(onlym.room.create
)!qgfkmcpByUudrFHTMX:gitter.im
(no events)!tFwVYIoUNZQhRhjFNO:gitter.im
(no events)!qoZZIEjvblGKIkqEdj:gitter.im
(no events)In terms of moving Gitter forward, I think we can safely reconcile this by removing the bridged room entry in the Gitter database and the Matrix room alias and have them re-created (tracked by https://gitlab.com/gitterHQ/webapp/-/issues/2858#note_1267698515). But I'm unable to remove the Matrix room alias from the old room (
M_FORBIDDEN
) 😬And it would still be good to fix the Synapse bugs that allow this to happen.
Steps to reproduce
Homeserver
gitter.im
Synapse Version
1.74.0
Installation Method
I don't know
Database
PostgreSQL
Workers
Multiple workers
Platform
gitter.im
(gitter.ems.host
) runs on EMS but I don't know the detailsPython version:
3.11.1
Configuration
No response
Relevant log output
The only relevant logs for this room from the time of creation is this message:
https://modular-euwest2-kibana.proxy.matrix.org/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-8w,to:now-3w))&_a=(columns:!(_source),index:'67745970-aa2d-11ea-96a7-efc84863d0ea',interval:auto,query:(language:kuery,query:'kubernetes.pod.labels.hostname:%22gitter.ems.host%22%20AND%20message:%22!!gHbYoGxiEuXmahoYeh:gitter.im%22'),sort:!(!('@timestamp',desc)))
Database information:
Anything else that would be useful to know?
No response