matrix-org / matrix-rust-sdk

Matrix Client-Server SDK for Rust
Apache License 2.0
1.26k stars 247 forks source link

[meta] `EventCache` storage #3280

Open Hywan opened 7 months ago

Hywan commented 7 months ago

output


matrix_sdk::EventCache is a new module that has been introduced with #3058. The idea is to manage all events inside the Rust SDK in a single place. An important user of EventCache is the matrix_sdk_ui::Timeline.

EventCache uses a new data structure to organize all events, LinkedChunk, since:

This issue describes the plan to get a persistent storage for EventCache, along with the ability to make it reactive.

Persistent storage

In order to create a persistent storage for EventCache, we need a mechanism that listens to LinkedChunk updates, and map these updates to database operations (like INSERT, DELETE and so on).

We initially went to using a reactive mechanism (like ObservableLinkedChunk), but like any reactive mechanisms, we need to handle the lag.

[!NOTE] What is a lag? When new updates are generated, they are usually accumulated in a buffer. This buffer is drained by subscribers of the observed value. When the buffer is full, which can happen because subscribers lag to consume updates, the buffer is reset.

The lag, in this case, is pretty problematic. In case of a lag, the database will be out of sync —data will be missing—, and there is no easy way to get them again. In case of a lag, we might imagine to reset the database and to rewrite everything again, but not all events are loaded in memory. Alternatively, we may want to recompute all the differences between what is in memory and what is inside the database, but again, it's not an easy problem. Anyway, it implies more guards and more complexity.

Instead, we've decided to define a LinkedChunkListener trait, used by LinkedChunk, onto which methods will be called on some particular operations, like insert_chunk, remove_chunk, insert_events and remove_events, that's probably all we need.

The cons:

The pros:

Tasks

Reactive EventCache

Right now, there is no satisfying way to get a stream of EventCache updates. The only mechanism that exists so far is RoomEventCache::subscribe. It returns a tokio::sync::mpsc::Receiver<RoomEventCacheUpdate>. RoomEventCacheUpdate is defined like so:

https://github.com/matrix-org/matrix-rust-sdk/blob/ac0bc95c253c5d46fec31378016497f433e0067d/crates/matrix-sdk/src/event_cache/mod.rs#L816-L838

Constraint: Prepare for reconciliation

This is OK-ish for the moment (at the time of writing), but it will quickly show limitations, in particular with the reconciliation.

[!NOTE] When events are received in different orders —e.g. between /messages, /context or /sync—, it's important to re-order them. We call that reconciliation. It's far, far, faaar from trivial. Actually, there is no solution to this problem, but we will try to make the best heuristics as possible.

Why reconciliation is going to create a problem here? Because it's impossible to represent an update like: “remove item at position $p$, and insert item at position $q$”. The only possible updates so far are “clear” and “append”.

The assiduous reader (oh, hi^1) will think: “How does it work with back- or front-pagination?”. Thanks for asking. Let's make a detour.

Constraint: Pagination

Frontpagination isn't implemented yet (not something hard, just not here yet). Backpagination is done with RoomEventCache::backpaginate. It returns a BackPaginationOutcome, defined like so:

https://github.com/matrix-org/matrix-rust-sdk/blob/ac0bc95c253c5d46fec31378016497f433e0067d/crates/matrix-sdk/src/event_cache/mod.rs#L792-L814

Ah. Isn't it using RoomEventCacheUpdate? Well, no, because of the Timeline! The API from EventCache has been extracted from the Timeline. The Timeline had and still has 2 sources of updates: /sync and /messages for pagination.

Would it be hard to switch to a single Stream<Item = SyncTimelineEvent>? Well. Yes and no.

https://github.com/matrix-org/matrix-rust-sdk/blob/ac0bc95c253c5d46fec31378016497f433e0067d/crates/matrix-sdk-ui/src/timeline/builder.rs#L133-L137

https://github.com/matrix-org/matrix-rust-sdk/blob/ac0bc95c253c5d46fec31378016497f433e0067d/crates/matrix-sdk-ui/src/timeline/pagination.rs#L49-L51

What do we want? A single source of data for Timeline. What does it require? Change the backpagination mechanism. Not a big deal, it's doable. The biggest difficulty is that the Timeline will ask for data that will be injected to another place (Timeline::paginate_backwards will see the results of RoomEventCache::backpaginate via RoomEventCache::subscribe). It's not easy to connect both. How to do it? Glad you ask.

The solution: Step 1

LinkedChunk must expose a Stream<Item = Vec<VectorDiff<SyncTimelineEvent>>>-like API, something like:

impl LinkedChunk {
    fn subscribe_as_vector(&self) -> (Vec<SyncTimelineEvent>, impl Stream<Item = Vec<VectorDiff<SyncTimelineEvent>>>) {
        todo!()
    }

Easy right? Well. No. LinkedChunk is not a Vector. The algorithm is going to be fun here. LinkedChunk::subscribe_as_vector should fake it's a Vector and should emit VectorDiff, à la eyeball_im::ObservableVector.

Of course, RoomEventCache::subscribe must be rewritten to use LinkedChunk::subscribe_as_vector.

The solution: Step 2

Timeline must listen to RoomEventCache::subscribe but for all updates. Then, Timeline will map Stream<Item = Vec<VectorDiff<SyncTimelineItem>>> into TimelineItems that will be inserted/deleted/moved in the correct places inside its own ObservableVector<TimelineItem> inside TimelineInnerState:

https://github.com/matrix-org/matrix-rust-sdk/blob/ac0bc95c253c5d46fec31378016497f433e0067d/crates/matrix-sdk-ui/src/timeline/inner/state.rs#L71-L75

This is going to be delicate.

Tasks

The 2 following lists can be done in parallel:

Tasks on EventCache:

Tasks on EventCache and Timeline:

The following list must be done after the 2 previous lists:

Lazy EventCache: Combine pagination and persistent storage

Before being able to enable persistent storage in EventCache, one last problem must be addressed.

RoomEventCache will use RoomEvents (which uses LinkedChunk) to load events from the persistent storage. That's fine. However, we don't want to load all events from the persistent storage. Imagine rooms with 1'000'000 events: do we want to load all the events in memory? Absolutely no! RoomEventCache must be lazy: it must load only the $n$ newest chunks from the persistent storage.

OK, but what happens when we backpaginate?

[!NOTE] A more detailed plan is being drafted by @bnjbvr and @Hywan to define when running a network request (/messages) is necessary or not necessary. The heuristic is not trivial, all edge-cases must be well-defined and considered carefully.

Once we have this mechanism in-place, we can enable the persistent storage for EventCache.

Tasks

Naive reconciliation

To start with, we can implement a super native reconciliation algorithm that will simply remove duplicated events. For a first shot, it's totally fine. A better reconciliation algorithm can be defined later. For the record, what is present now is the remove duplicated events approach.

What we want to ensure is that the reconciliation algorithm will modify events in LinkedChunk, which will propagate to the persistent storage via LinkedChunkListener and to the Timeline via RoomEventCache::subscribe. That's one of the main goal of this proposed designed.

Tasks

Conclusion

This plan provides a solution to support a reconciliation mechanism in a single place. It will benefit to other users, like Timeline. The Timeline needs to be refactored due to having 2 source of updates, one for /sync (live events), and one for /messages (front- and back-paginations of old events).

Because EventCache will be reactive, it will simplify a lot all the testing aspects:

let inputs = stream! {
    yield vec![VectorDiff::Append { values: events![…, …, …] }];
    yield vec![VectorDiff::Remove { index: … }];
    yield vec![VectorDiff::Insert { index: …, value: … }];
};

Bonus


manuroe commented 1 month ago

Constraint: Pagination

The biggest difficulty is that the Timeline will ask for data that will be injected to another place (Timeline::paginate_backwards will see the results of RoomEventCache::backpaginate via RoomEventCache::subscribe). It's not easy to connect both.

For info, this is exactly the shape used in the Android SDK (hello @ganfra!). The timeline observes a reactive DB that is fed by other mechanisms. The actioner and the consumers are deconnected.

Lazy EventCache: Combine pagination and persistent storage

We must run a proper backpagination (with /messages) to see if events aren't missing from the federation that could have missed by /sync,

This is not mandatory. We must use the pagination token provided by the backend on /sync, /messages, /context requests. The backend will provide late messages through this stream. How to render or reorder a late message is a tricky question where we need help from the product. There is no good answer as far as I know:

I think most matrix clients take the second approach (which is also the lazy one). If we choose this path too, there might be things to do in the timeline module to flag late messages to offer a better UX. But again, we need a product decision first.