Closed bwindels closed 2 years ago
we have also several report of that on android https://github.com/matrix-org/element-android-rageshakes/issues/20724 https://github.com/matrix-org/element-android-rageshakes/issues/20672 https://github.com/matrix-org/element-android-rageshakes/issues/20578
I also experienced it on other client (web)
Also this issue https://github.com/vector-im/element-android/issues/516
@callahad @anoadragon453 I renamed this ticket as #9624 (which was marked as a dupe) is about the recipient and not the sender not receiving the message. FYI if this changes the severity of the bug.
Just an update: This is on our radar, but we expect it to be a pretty big yak to shave, so it's currently deferred until we can wrap up some other hairy problems 🐃
Matthew reported another instance of this today. Link and details for internal users at https://github.com/matrix-org/element-web-rageshakes/issues/5808
And another set by Matthew on Aug 10:
Room ID: !JjwYSGirOwIXEpJgzg:matrix.org
Event ID: $ZrcrvSAESv5PBzszSXpT5EahA2eQR-raFbFfSY314LQ
We should add some more logs to try and track down where the event is going missing. I'd probably do something like:
_get_rooms_changed
)_generate_room_entry
call, tagging it with room IDetc. The idea being that hopefully we'll be able to see if the missing event is pulled out of the DB and if so where we drop it (if we ever do).
And another just now...
Room ID: !KzalCNJxkqtytlbQvX:matrix.org
Event ID: $16310279891219482qXcHE:matrix.org
...and again...
Room ID: !KzalCNJxkqtytlbQvX:matrix.org
Event ID: $16311860683330067SxAox:matrix.org
...and again...
Room ID: !KzalCNJxkqtytlbQvX:matrix.org
Event ID: $16322575506214369TPVVU:matrix.org
...and again...
Room ID: !rZUxrMFEsvfBNGpmDf:matrix.org
Event ID: $1633697134169238TSzOW:matrix.org
The event in question:
matrix=> select topological_ordering, instance_name, stream_ordering, to_timestamp(origin_server_ts/1000) from events where event_id = '$1633697134169238TSzOW:matrix.org';
topological_ordering | instance_name | stream_ordering | to_timestamp
----------------------+-------------------+-----------------+------------------------
3229 | event_persister-3 | 2372403105 | 2021-10-08 13:45:34+01
(1 row)
The trace for the sync that should have returned it (will disappear soon): https://jaeger.int.matrix.org/trace/901206efde7b4ac6, which happened on synchrotron-36.
The /sync
gets woken up twice, the first with a since and now token of:
RoomStreamToken(topological=None, stream=2372403104, instance_map=<frozendict {'event_persister-2': 2372403109, 'event_persister-1': 2372403109}>)
RoomStreamToken(topological=None, stream=2372403115, instance_map=<frozendict {}>)
this should have picked up the sent event, as it was sent on event_persister-3 with stream ordering 2372403105
. However, it decides that no rooms have changed: rooms_have_changed: false
. The subsequent time the sync gets woken up the token are:
RoomStreamToken(topological=None, stream=2372403104, instance_map=<frozendict {'event_persister-2': 2372403109, 'event_persister-1': 2372403109}>)
RoomStreamToken(topological=None, stream=2372403130, instance_map=<frozendict {}>)
,
which also finds that no rooms have changed.
The RoomStreamToken
and PersistedEventPosition
work correctly with these values:
In [1]: from synapse.types import RoomStreamToken, PersistedEventPosition
In [2]: from frozendict import frozendict
In [3]: since = RoomStreamToken(None, 2372403104, frozendict({'event_persister-2': 2372403109, 'event_persister-1': 2372403109}))
In [4]: pos = PersistedEventPosition("event_persister-3", 2372403105)
In [5]: pos.persisted_after(since)
Out[5]: True
We see in the synchtron36 logs:
2021-10-08 12:45:34,307 - synapse.replication.tcp.handler - 509 - INFO - replication_command_handler-140621334838808 - Handling 'POSITION events event_persister-3 2372403104 2372403108'
which is worrying, as event_persister-3
did write the event in the range 2372403104..2372403108
. The event persister should not have sent out a POSITION
with that range, as it means "I did nothing in this range".
I think there is a race between workers sending out the appropriate RDATA
and us sending out the periodic POSITION
.
On event persister 3:
2021-10-08 12:45:34,150 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-6533172 - Streaming: events -> 2372403104
...
2021-10-08 12:45:34,298 - synapse.replication.tcp.resource - 204 - INFO - replication_notifier-6533172 - Sending position: events -> 2372403108
...
2021-10-08 12:45:34,330 - synapse.access.http.18129 - 421 - INFO - POST-3580545 - ::ffff:10.103.0.1 - 18129 - {@matthew:matrix.org} Processed request: 0.183sec/0.009sec (0.003sec, 0.000sec) (0.000sec/0.000sec/0) 71B 200 "POST /_synapse/replication/send_event/%241633697134169238TSzOW%3Amatrix.org/shpMywvFLw HTTP/1.1" "Synapse/1.44.0rc2 (b=matrix-org-hotfixes,ebbd37b66)" [0 dbevts]
...
2021-10-08 12:45:34,366 - synapse.replication.tcp.resource - 188 - INFO - replication_notifier-6533173 - Streaming: events -> 2372403114
which looks to confirm that we're spuriously sending out a POSITION
instead of the event in question.
Fetching the updates doesn't appear to be hitting any caches, and just does a query with a WHERE ? < stream_ordering <= ?
clause, so I don't think its caching.
The query in question returns the row now:
SELECT e.stream_ordering, e.event_id, e.room_id, e.type,
state_key, redacts, relates_to_id, membership, rejections.reason IS NOT NULL
FROM events AS e
LEFT JOIN redactions USING (event_id)
LEFT JOIN state_events USING (event_id)
LEFT JOIN event_relations USING (event_id)
LEFT JOIN room_memberships USING (event_id)
LEFT JOIN rejections USING (event_id)
WHERE 2372403104 < stream_ordering AND stream_ordering <= 2372403108
AND instance_name = 'event_persister-3'
ORDER BY stream_ordering ASC
LIMIT 100;
My current best guess is that there is a bug in MultiWriterIdGenerator
which is causing it to return a current token for the local instance before all previous tokens have finished persisting.
...and another by @hughns this morning (taken from his rageshake):
2021-10-11T10:33:35.753Z I Event sent to !ZiIcIhEEPvPrkUeheO:matrix.org with event id $16339484157934361ksrnd:matrix.org
2021-10-11T10:33:35.753Z I setting pendingEvent status to sent in !ZiIcIhEEPvPrkUeheO:matrix.org event ID ~!ZiIcIhEEPvPrkUeheO:matrix.org:m1633948414939.743 -> $16339484157934361ksrnd:matrix.org
...and the remote echo never comes.
Worth noting that the stream ID sent by the POSITION
2372403108
was written by event_persister-2
, and gets replicated to event_persister-3
at:
2021-10-08 12:45:34,265 - synapse.replication.tcp.handler - 509 - INFO - replication_command_handler-139816608471832 - Handling 'POSITION events event_persister-2 2372403107 2372403108'
Which is before the missing event is finished persisting (it's received_ts
is 12:45:34,280
). This makes it look very much like the event stream ID generator is incorrectly returning a current token of 2372403108
even though 2372403105
hasn't been persisted yet.
For reference, the code where we update the current position of the local instance when we receive updates from remote instances is:
Turns out there is a race between the ID generator being allocated stream IDs and adding them to its list of unpersisted stream IDs, which meant that it could incorrectly advance the current position too far. #11045 should fix it.
...and one from Amandine...
!KzalCNJxkqtytlbQvX:matrix.org
$16339912738468991gJzQg:matrix.org
I've deployed a change to matrix.org that should hopefully fix this. Let me know if it happens again
it happened again:
Room ID: !FuBNfHZXzpaxULYtAd:matrix.org
Event ID: ~!FuBNfHZXzpaxULYtAd:matrix.org:m1634134221985.8790
actually, this looks like an e2ee client bug
@ara4n what drove you to an e2ee client bug ? do you have a bug open somewhere ?
Asking because I have a similar case. Doing a clear cache gave us back the message, but undecryptable because of duplicate message index, possible replay attack
. Other people received the message just fine.
Room ID: !xjfBBlIpIRdmnzhWYJ:matrix.org
Event ID: $PcE5p8JizhZHUkjh7YDuWf-ktT30VhbsAaFZr3dvUX0
Device ID: VCTTPNSSYZ
Going to re-open until we track down the above. Will probably want to look through the logs before they get rotated out (the event happened on Monday I believe it was said) — going to raise at triage meeting tomorrow because I'm not clear on what I'm looking for.
@MatMaul As an update, it's unlikely we can debug the issue you had on Monday because most of the interesting information is in Jaeger, which people like Matthew are set up to trigger every time, but other people something like 1 in 50 requests. To troubleshoot this, we would probably want a sequence of 3 /syncs, which would give you a (1/50)³ or 1/125000 chance of having hit the right conditions to be sampled by Jaeger. To my understanding, with your consent, this could also be enabled for you if desired, in case the bug happens again in the future (I'm not personally sure what the considerations are, but you can ask one of us in the backend room of course...).
It would still be interesting to know from @ara4n why the previous issue was thought to be a client E2EE bug and not this issue (to see if it's the same for your case).
FTR We implemented a workaround in https://github.com/vector-im/element-android/pull/4928, which will be available in Element Android 1.3.16 (next release)
I think maybe I just ran into this?
Newly bridged room, events in the room that looks like this.
topological_ordering | event_id | type | room_id | content | unrecognized_keys | processed | outlier | depth | origin_server_ts | received_ts | sender | contains_url | instance_name | stream_ordering | state_key | rejection_reason |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | $p9Abgg9tv02f6tKODR7wIy0i7qV_IHti4_C998jbQVE | m.room.create | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 1 | 1646688871373 | 1646688871414 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491927 | NULL | |
2 | $XP3flh1UMzMlCsksvaUy5QqWA3B5zZStw7feahPSUpo | m.room.member | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 2 | 1646688871490 | 1646688871537 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491928 | @_adam203072022_whatsapp_xxx:beeper-dev.com | NULL |
3 | $TM35mHbuMlFmWiospfJm9m8ABaVNAORQ2_jdCaTkJBI | m.room.power_levels | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 3 | 1646688871610 | 1646688871659 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491929 | NULL | |
4 | $CCkySQxH_4sjBfTsJxFggOMiwQ3C3yhQkNk2T3iQjvI | m.room.join_rules | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 4 | 1646688871718 | 1646688871763 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491930 | NULL | |
5 | $4UzYP-X6ed6JYls1e47wOWEo1MblN1sWKAoIJS6NnjI | m.room.history_visibility | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 5 | 1646688871825 | 1646688871873 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491931 | NULL | |
6 | $E9MoYI7V9WMas1fL0DhPdL3xM_tCVLAmAlqxQhgSVeU | m.room.guest_access | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 6 | 1646688871957 | 1646688871997 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491932 | NULL | |
7 | $1p9qznXfxDpj-yyPYSf2_sFRWxLpIBJVDU79YGyrtTk | m.bridge | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 7 | 1646688872072 | 1646688872101 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491933 | net.maunium.whatsapp://whatsapp/xxx@s.whatsapp.net | NULL |
8 | $HDGxoG8PotKnEs92lm4y4z_ecaQ_Lr41-SAn9PDZJJo | uk.half-shot.bridge | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 8 | 1646688872151 | 1646688872192 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491934 | net.maunium.whatsapp://whatsapp/xxx@s.whatsapp.net | NULL |
9 | $LmWV4ZraOZ9HUhL8ZE5Rsl9_GsKiFb8EEuq7jq73MTA | m.room.avatar | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 9 | 1646688872256 | 1646688872304 | @_adam203072022_whatsapp_xxx:beeper-dev.com | Yes | synapse-event-persister-1 | 2491935 | NULL | |
10 | $BGiiLu-iIpuo8I6zthh-l1QlqIZDL-Sq52ahd_KK4tM | m.room.encryption | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 10 | 1646688872355 | 1646688872392 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491936 | NULL | |
11 | $J9tLCWLA76Yq0TFw7V-S0hsbG8znHwA6we1jZy6nuf4 | http://m.room.name/ | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 11 | 1646688872445 | 1646688872476 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491937 | NULL | |
12 | $GzdqlpNsqZ09ZsMmJpz3n4zqmqScc-Yrt6ErzpN7858 | m.room.topic | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 12 | 1646688872511 | 1646688872541 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491938 | NULL | |
13 | $iUd_bYnsP33IHeBz0q2tTxZT37vtqWpklKo53lr1MBw | m.room.member | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 13 | 1646688872593 | 1646688872654 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491939 | @_adam203072022_whatsapp_bot:beeper-dev.com | NULL |
14 | $NbNAmJmGCOQ-fJV9g16kLFOYdC0E9G46XkDsowGoxRE | m.room.member | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 14 | 1646688872941 | 1646688872978 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491940 | @adam203072022:beeper-dev.com | NULL |
15 | $VDdaVrwPRIOdFaafGPPSR242zVGXee2agRtRjZNGbXE | m.room.member | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 15 | 1646688873144 | 1646688873200 | @adam203072022:beeper-dev.com | No | synapse-event-persister-1 | 2491941 | @adam203072022:beeper-dev.com | NULL |
16 | $7wsVPnOzGCH_ZXUHGfPJBiwF2z1pMTMdyhQLpynUkJ4 | m.room.member | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 16 | 1646688873408 | 1646688873442 | @_adam203072022_whatsapp_bot:beeper-dev.com | No | synapse-event-persister-1 | 2491942 | @_adam203072022_whatsapp_bot:beeper-dev.com | NULL |
17 | $utQhwFEYV1mpeHH_uWJUhRW_drV-h0w9SHhvB4yK1RM | fi.mau.dummy.portal_created | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 17 | 1646688873650 | 1646688873682 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491943 | NULL | NULL |
18 | $gdlzxXbXnQMttMwmgj1dVBeCxpYTwumKNj5xpL_nkNM | fi.mau.dummy.pre_backfill | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 18 | 1646688875517 | 1646688875539 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491945 | NULL | NULL |
19 | $Mt9Ldr_g1ld8Y5OagQIQ2cDGHNcAw_rQx3D733SVeVc | fi.mau.dummy.backfill_end | !vXpiDigZlPILPsLRdK:beeper-dev.com | NULL | NULL | Yes | No | 19 | 1646688877845 | 1646688877876 | @_adam203072022_whatsapp_xxx:beeper-dev.com | No | synapse-event-persister-1 | 2491946 | NULL | NULL |
Made a sync request with the stream token s2491926_4_300_6622318_1064167_531325_269851_264048_1 and got a next_batch of s2491940_4_300_6622320_1064168_531325_269851_264048_1, but notice how we didn't get the m.bridge
event with the stream_ordering of 2491933 in the reponse.
// https://matrix.beeper-dev.com/_matrix/client/r0/sync?filter=0&timeout=30000&since=s2491926_4_300_6622318_1064167_531325_269851_264048_1
{
"next_batch": "s2491940_4_300_6622320_1064168_531325_269851_264048_1",
"device_one_time_keys_count": {
"signed_curve25519": 50
},
"org.matrix.msc2732.device_unused_fallback_key_types": [
"signed_curve25519"
],
"device_unused_fallback_key_types": [
"signed_curve25519"
],
"rooms": {
"invite": {
"!vXpiDigZlPILPsLRdK:beeper-dev.com": {
"invite_state": {
"events": [
{
"type": "m.room.avatar",
"state_key": "",
"content": {
"url": "mxc://beeper-dev.com/e56a9a291477956ca28034c41a564f38b4826a90"
},
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com"
},
{
"type": "m.room.create",
"state_key": "",
"content": {
"creator": "@_adam203072022_whatsapp_xxx:beeper-dev.com",
"m.federate": false,
"room_version": "9"
},
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com"
},
{
"type": "m.room.encryption",
"state_key": "",
"content": {
"algorithm": "m.megolm.v1.aes-sha2"
},
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com"
},
{
"type": "m.room.join_rules",
"state_key": "",
"content": {
"join_rule": "invite"
},
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com"
},
{
"type": "m.room.member",
"state_key": "@_adam203072022_whatsapp_xxx:beeper-dev.com",
"content": {
"avatar_url": "mxc://beeper-dev.com/e56a9a291477956ca28034c41a564f38b4826a90",
"displayname": "+xxx",
"membership": "join"
},
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com"
},
{
"type": "m.room.name",
"state_key": "",
"content": {
"name": "+xxx"
},
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com"
},
{
"type": "m.room.topic",
"state_key": "",
"content": {
"topic": "WhatsApp private chat"
},
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com"
},
{
"content": {
"displayname": "adam203072022",
"fi.mau.will_auto_accept": true,
"is_direct": true,
"membership": "invite"
},
"origin_server_ts": 1646688872941,
"sender": "@_adam203072022_whatsapp_xxx:beeper-dev.com",
"state_key": "@adam203072022:beeper-dev.com",
"type": "m.room.member",
"unsigned": {
"age": 117
},
"event_id": "$NbNAmJmGCOQ-fJV9g16kLFOYdC0E9G46XkDsowGoxRE"
}
]
}
}
}
}
}
@bradtgmurray The invite_state
of a room will only include state events types specified by the room_prejoin_state
config option. By default m.bridge
won't be included. It will need to be added under room_prejoin_state.additional_event_types
on the homeserver that sent the invite. Do you have a snapshot of a /sync
request with m.bridge
missing from a rooms.join.<room_id>
map?
(Completely tangentially, note that as MSC2346 hasn't yet completed FCP, the unprefixed identifier m.bridge
should only be used in private rooms that do not extend past your own client implementations.)
@anoadragon453 I have re-created the case above from Brad; I've included the event list and trimmed sync responses in this gist: https://gist.github.com/Fizzadar/316dabfe9be30ebaa48e288de2380de1. Some notes:
Notably, this is only occurring during initial backfill using MSC2716 batch sends. When this is disabled on the bridge we have not yet replicated the problem seen here.
@Fizzadar Hi; thanks for taking some time to write this up.
I'm still unfamiliar with MSC2716, but I think it may be useful to know the structure of the DAG to figure out what's going on here. Is there any chance you'd be able to run the following?:
SELECT stream_ordering, type, state_key, event_id, (ej.json::json)->'prev_events' AS prev_events FROM events
LEFT JOIN event_json AS ej USING (event_id)
WHERE events.room_id ='!SjekxtmsBigrUMcMkX:beeper-dev.com' ORDER BY stream_ordering ASC;
If the room has been used more since your last message, feel free to trim it to the range that was there previously.
@reivilibre thanks for looking into this; query results:
LEFT JOIN event_json AS ej USING (event_id)
WHERE events.room_id ='!SjekxtmsBigrUMcMkX:beeper-dev.com' ORDER BY stream_ordering ASC;
stream_ordering | type | state_key | event_id | prev_events
-----------------+------------------------------+--------------------------------------------------------------+----------------------------------------------+--------------------------------------------------
-113203 | org.matrix.msc2716.insertion | | $G5GLwX6lc8MXKqwauoM1p9lkLD6S5GfwcosFlPra0ws | []
-113202 | m.room.encrypted | | $I3PsF44AMu5I_XOZQ2Hm1cOhcdLJCbA8NcAMS69hLEE | ["$G5GLwX6lc8MXKqwauoM1p9lkLD6S5GfwcosFlPra0ws"]
-113201 | m.room.encrypted | | $hIO0ZI-ZYr8gWGkWtvBiPfCA0jKdjiZP7PCnEfuutTY | ["$I3PsF44AMu5I_XOZQ2Hm1cOhcdLJCbA8NcAMS69hLEE"]
-113200 | m.room.encrypted | | $-jWSmA7NJ1dbl5b86K8FAKN2ydt-qoOw8ftCdSEN4A4 | ["$hIO0ZI-ZYr8gWGkWtvBiPfCA0jKdjiZP7PCnEfuutTY"]
-113199 | m.room.encrypted | | $5EtpKKW7g5HV5oCweU6vlbGjptR095p-kZSSSy2tBgs | ["$-jWSmA7NJ1dbl5b86K8FAKN2ydt-qoOw8ftCdSEN4A4"]
-113198 | m.room.encrypted | | $5aVzqGvybXzV3Xj3AwA8SQKqp2X2Z7RLH7IGX5VBeX4 | ["$5EtpKKW7g5HV5oCweU6vlbGjptR095p-kZSSSy2tBgs"]
-113197 | org.matrix.msc2716.batch | | $KkSRlRidPiOkkLEuiHs3gWLyCqCdQTHAVYJxzhDIzcM | ["$5aVzqGvybXzV3Xj3AwA8SQKqp2X2Z7RLH7IGX5VBeX4"]
-111135 | org.matrix.msc2716.insertion | | $LQBf_c9-1stBXkPOLqZwkJgMg5YBCp448qcUwKLN6ks | []
-111134 | m.room.encrypted | | $RzKCB8rV0a9jfQVfbJeGvgGIZtT_X7ODb-78LUhXpXs | ["$LQBf_c9-1stBXkPOLqZwkJgMg5YBCp448qcUwKLN6ks"]
-111133 | org.matrix.msc2716.batch | | $l_jGmPdlA73wDzS3_gRwC67co1pTnx9-QBufta10xWI | ["$RzKCB8rV0a9jfQVfbJeGvgGIZtT_X7ODb-78LUhXpXs"]
-111132 | org.matrix.msc2716.insertion | | $_mT6Awgb70IjwseWpCUeU0O4tm_iT7OBe4IPLJYUHUU | ["$a6xyChJTR8wJLsfYfGI8Tu1W5cXrtohkBRdwCSsyG68"]
2633404 | m.room.create | | $ZMCw8gIYOeNmC2mGNKiAdxbRdwuhDiA9gNaM8EwDlRM | []
2633421 | m.room.member | @_adam4_whatsapp_xxx:beeper-dev.com | $a6LZbYMvr5yYRcFc0uethSHLVC_0l7LRUr0nCZXB2lQ | ["$ZMCw8gIYOeNmC2mGNKiAdxbRdwuhDiA9gNaM8EwDlRM"]
2633434 | m.room.power_levels | | $S2rnt0m3dfbgC0ySGI3RGSdIjNqhgjjilZ2Gm7jGtvg | ["$a6LZbYMvr5yYRcFc0uethSHLVC_0l7LRUr0nCZXB2lQ"]
2633448 | m.room.join_rules | | $5hkAty1mFIMtuFINEh8wA3OyIgnO6Qxh1HgCTT9c1p0 | ["$S2rnt0m3dfbgC0ySGI3RGSdIjNqhgjjilZ2Gm7jGtvg"]
2633461 | m.room.history_visibility | | $nNR6iQSvWPpoKRvfpXJXU_sWpnk8eCwTFJD9zwC2rfo | ["$5hkAty1mFIMtuFINEh8wA3OyIgnO6Qxh1HgCTT9c1p0"]
2633469 | m.room.guest_access | | $iNJMDkjnFwBekoCZASqZmiNtH8UMLsPc_1VHJD6IFpU | ["$nNR6iQSvWPpoKRvfpXJXU_sWpnk8eCwTFJD9zwC2rfo"]
2633478 | m.bridge | net.maunium.whatsapp://whatsapp/xxx@s.whatsapp.net | $p5fs4ik0Lnj-mIbw-cjPHX_aWzx0x6O_rIrhSi8_1C8 | ["$iNJMDkjnFwBekoCZASqZmiNtH8UMLsPc_1VHJD6IFpU"]
2633492 | uk.half-shot.bridge | net.maunium.whatsapp://whatsapp/xxx@s.whatsapp.net | $tAZ3y00piOU_54qs0MuK-HRPivXNRBmN8x6yY5KWXkk | ["$p5fs4ik0Lnj-mIbw-cjPHX_aWzx0x6O_rIrhSi8_1C8"]
2633503 | m.room.avatar | | $BZOaOEhozp28cCJtdh8A6R2hKgXiuDmd1NWI2vEuyPs | ["$tAZ3y00piOU_54qs0MuK-HRPivXNRBmN8x6yY5KWXkk"]
2633515 | m.room.encryption | | $qTv8IJZIdlH_3S7atbGTrd-7p7UEt4siocS0vz8-K3Y | ["$BZOaOEhozp28cCJtdh8A6R2hKgXiuDmd1NWI2vEuyPs"]
2633523 | m.room.name | | $joClbFehqiqiVOuYYDWGJeYaozJnaccZgVog8j4bB6g | ["$qTv8IJZIdlH_3S7atbGTrd-7p7UEt4siocS0vz8-K3Y"]
2633533 | m.room.topic | | $WP0ALeVvT2oCRSt9qs4_bE2O5AKsu_4GqbTCPFwaKa4 | ["$joClbFehqiqiVOuYYDWGJeYaozJnaccZgVog8j4bB6g"]
2633547 | m.room.member | @_adam4_whatsapp_bot:beeper-dev.com | $hq4kZZPt4ItYdBF_I_AVITFHOZOZ2b_Av8ss3F06byg | ["$WP0ALeVvT2oCRSt9qs4_bE2O5AKsu_4GqbTCPFwaKa4"]
2633571 | m.room.member | @adam4:beeper-dev.com | $DyFoSr8CbRUyexbx6_yuZxgS4S7lAEHDYgldiRGnppc | ["$hq4kZZPt4ItYdBF_I_AVITFHOZOZ2b_Av8ss3F06byg"]
2633582 | m.room.member | @adam4:beeper-dev.com | $51gzIcw5d1fvbNLqYXdBwma669-H8DZlz0UzsavMBbU | ["$DyFoSr8CbRUyexbx6_yuZxgS4S7lAEHDYgldiRGnppc"]
2633590 | m.room.member | @_adam4_whatsapp_bot:beeper-dev.com | $Sx9mswlxUJs1rRL4UyHARmY7V0h8_cH_c_ZDnwTn3cc | ["$51gzIcw5d1fvbNLqYXdBwma669-H8DZlz0UzsavMBbU"]
2633601 | fi.mau.dummy.portal_created | | $a6xyChJTR8wJLsfYfGI8Tu1W5cXrtohkBRdwCSsyG68 | ["$Sx9mswlxUJs1rRL4UyHARmY7V0h8_cH_c_ZDnwTn3cc"]
2633614 | fi.mau.dummy.pre_backfill | | $E97WApcmVmCJ6nt2WL3TnuxhyCLnPl1dt-t6p25ZLwY | ["$a6xyChJTR8wJLsfYfGI8Tu1W5cXrtohkBRdwCSsyG68"]
2633633 | fi.mau.dummy.backfill_end | | $NILDi9SfVDgaW72alIEsc5ESzdJSoLip_c7klMAeuPo | ["$E97WApcmVmCJ6nt2WL3TnuxhyCLnPl1dt-t6p25ZLwY"]
(30 rows)```
@Fizzadar Oooh, interesting (maybe, likely need to think about it! whilst reading the batch send MSC!).
What I find interesting is that there are multiple events with no prev_events
(which is what I expect from the batch sending stuff) but no later events have more than 1, to merge the forks together. (Only making remarks so I can keep track of this; not sure if that's important or not.)
Are those all the events in the room; if not, are there any that merge later on?
@reivilibre that's everything in the room! We left it alone after replicating the issue in case this data is useful :)
I wonder if it makes sense to split out our line of discussion, because presumably this is something to do with batch_send and not whatever the original issue was back in Feb 2021.
Given the complexity of how Synapse's implementation ofMSC2716 interacts with the room DAG, it does appear likely that the bug in your case @bradtgmurray is related to batch_send code - specifically the /sync bit of it. Moving and tackling this case in a separate issue would be ideal.
I see that https://github.com/mautrix/whatsapp/pull/463 has been linked to this issue though - do you find that backfilling the room works to recover the missing event in all cases?
The whatsapp PR is just us trying things attempting to work around the issue, haven't gotten to the bottom of it yet.
Do you know if the DAG we posted above looks correct to you?
I've split out relevant parts into a new issue https://github.com/matrix-org/synapse/issues/12281, let's continue discussions there.
I'm not really sure what's going on here. It seems like there are:
In any case, there is probably more than one bug here, and lumping further reports into a single issue is likely to cause confusion.
As such, can this be closed? I don't think there's any useful context here.
@richvdh I agree - this issue has become a bit of a mishmash of related problems, with an awful lot of old comments to read before hitting content relevant to today's ecosystem.
Closing this issue as it looks to have run its course. Please file a new issue if you have one; potentially linking this one if relevant.
The event in question is https://app.element.io/#/room/!RbKRuLTvHIhArQcvjr:matrix.org/$1613561441289661rpDpF:matrix.org?via=matrix.org&via=vector.modular.im&via=t2l.io
The client that sent the event (element desktop) and my dekstop hydrogen running alongside did not get the event from sync.
On element desktop the event was black (e.g. sent) but stuck to the bottom of the room, and on hydrogen I check the storage and it showed that there was no limited sync for that room around those messages, and there is no trace of that event anywhere.
When opening my phone hydrogen an hour or so later, the event was synced down to it in the correct order.
At least T3chguy and Ryan did receive the event in the correct order.