matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
184 stars 94 forks source link

lazy-loading behaviour for `/sync` is incorrectly specified #942

Open DMRobertson opened 2 years ago

DMRobertson commented 2 years ago

Link to problem area: https://spec.matrix.org/v1.1/client-server-api/#get_matrixclientv3sync

I'm having a hard time following the description of /sync's behaviour. In particular, the third paragraph:

Further, like other members, the user's own membership event is eligible for being considered redundant by the server. When a sync is limited, the server MUST return membership events for events in the gap (between since and the start of the returned timeline), regardless as to whether or not they are redundant. This ensures that joins/leaves and profile changes which occur during the gap are not lost.

Note that the default behaviour of state is to include all membership events, alongside other state, when lazy-loading is not enabled.

Points of confusion:

richvdh commented 2 years ago

This is certainly not very clear.

Points of confusion:

  • I'm not sure what "being considered redundant by the server" refers to.

    • Is this in the context of lazy-loading?

Yes, this paragraph (as with the paragraph above it) is in the context of lazy-loading.

I thought that involved a specific user-defined filter, rather than the server's discretion.

lazy-loading is enabled via a filter, but that's just a boolean flag. Which events are omitted or returned, once lazy-loading is enabled, is subject to a somewhat complicated algorithm,

  • "membership events for events in the gap" sounds like "all membership events which are in the gap" to me; it wasn't obvious to me that events "have" or "own" a membership event.

    • It apparently means "the most recent membership event for each sender of an event in the gap".

Correct. Or rather, "the membership event for the sender of the event that was in force at the time of the event".

  • ...

    • Must we only return such membership events in the gap, or can they predate the gap? Put differently, can the server assume the client has a complete understanding of memberships at the since token, if it given?

Whether the membership events are in the gap or not is immaterial. And no, a server cannot assume the client has a complete understanding of memberships at the since token - the server has to keep track of which memberships it has told the client about.

(Is this the meaning of redundant in the ~final~ penultimate sentence?)

Where "penultimate sentence" is "This ensures that joins/leaves and profile changes which occur during the gap are not lost." ?

This is a good question. I don't understand what bearing "membership events for events in the gap" has on "profile changes in the gap". This wording appears to come from matrix-org/matrix-spec-proposals#2106. I've asked @ara4n if he can clarify.

  • "Note that the default behaviour of state is to include all membership events".

Yeah, this is unclear. It means "all state which has changed in the gap".

ara4n commented 2 years ago

This is a good question. I don't understand what bearing "membership events for events in the gap" has on "profile changes in the gap". This wording appears to come from https://github.com/matrix-org/matrix-spec-proposals/pull/2106. I've asked @ara4n if he can clarify.

I think this is a thinko: during a gap in a sync, we just need to know the membership events which have happened - not the membership events which relate to the events in the gap.

richvdh commented 2 years ago

ok, so that makes this a spec bug rather than a clarification. I've updated the labels and subject accordingly.

DMRobertson commented 2 years ago

Possibly related: this entry in Synapse's sytest blocklist:

# Blacklisted due to https://github.com/vector-im/riot-web/issues/7211
The only membership state included in a gapped incremental sync is for senders in the timeline
ShadowJonathan commented 2 years ago

When researching on how to clarify this in the spec documentation, I see the first question hasn't been answered yet;

I'm not sure what "being considered redundant by the server" refers to.

I encountered the same problem when I was working on converting lazy-loading-related sytests to complement.

What exactly does "redundant" mean here? And via which "reference" does the server track this?

Does the server track if member events have already been included into the timeline by the continuing next_batch tokens the server provide to the client? (as in, the server tracks if a user has "previously" been sent the membership event via a "previous" next_batch/since token, and will omit sending a membership event in the subsequent one(s))

timokoesters commented 2 years ago

What exactly does "redundant" mean here?

Redundant means the server has already sent the current member event to the client in the past and knows it was delivered.

And via which "reference" does the server track this?

Conduit does this: In every sync we send some set of member events to the client. Conduit only marks these members as "staging" and associates it to the next batch token. Only when the client calls sync again with that token, those marked events are confirmed as delivered and written to the db.

Conduit also does this for /messages by confirming the delivery when the client reuses the "from" token, but I have a feeling that clients do not like this.

timokoesters commented 2 years ago

Also I noticed an Element bug where it says "You're previewing #room, want to join it?" even though I am in the room already.

This might be an Element web bug. Does it assume that the server sends the user's own member event in an initial sync even when lazy loading? The spec says "the user’s own membership event is eligible for being considered redundant by the server"

richvdh commented 1 year ago

I must admit I'm still struggling to grok parts of this.

... servers MUST include the syncing user’s own membership event ... when the full state of rooms is requested, to aid discovering the user’s avatar & displayname.

Does "full state of rooms is requested" include an initial /sync? I'm pretty sure that under Synapse's interpretation it does (see https://github.com/matrix-org/synapse/blob/v1.73.0/synapse/handlers/sync.py#L948-L953; full_state is calculated here which notably includes room_builder.full_state, which in turn is set to true here for all rooms returned in an initialsync.

The issue linked there (https://github.com/vector-im/element-web/issues/7209) sounds very much like what @timokoesters mentions above and this discrepancy would indeed explain why he sees that error with Conduit.

It makes some kind of sense: the only other plausible interpretation of "full state of rooms is requested" is that the client set the "full_state" flag, and it's not obvious to me why would we make an exception for that, and room joins, without extending the same exception to initial /sync.


So, assuming the user's own membership event is not eligible for omission on initial /sync, when are they eligible for omission? One idea is that if we have a gappy (ie, limited) incremental /sync, where the user changed avatar or something in the gap, we could omit such membership events (until some other event from the user appears in the timeline).

However: note that synapse currently doesn't implement lazy-loading at all for incremental syncs, and it seems like clients depend on that behaviour to avoid bugs like https://github.com/vector-im/element-web/issues/7211 (and I strongly suspect that omitting membership events in incremental syncs would introduce problems with device-list tracking for E2E).

Given that, my inclination is to entirely remove the sentence about "the user's own membership event being eligible for being considered redundant" from the spec.

richvdh commented 6 months ago

I'm coming to the conclusion that lazy-loading for incremental syncs is unimplementable, as currently specced.

Maybe we should update the spec to tell servers not to do lazy-loading on non-full-state syncs? ("Full state syncs" are initial syncs, plus incremental syncs where we send the full state to the client; in particular when the user has joined the room since the last sync.)