matrix-org / matrix-rust-sdk

Matrix Client-Server SDK for Rust
Apache License 2.0
1.22k stars 246 forks source link

Generic API guarantee that events are available after being sent #1026

Open johannescpk opened 2 years ago

johannescpk commented 2 years ago

Is your feature request related to a problem? Please describe. Currently when sending events using Client::send, they are not part of the store immediately, but instead need a sync to happen for them to become available. From the consumer perspective this is inconsistent behavior, as sending an event gives me an event id, which I can't retrieve from the store directly after sending using methods like get_state_event / get_state_event_static. It requires in depth knowledge about Matrix and the SDK to achieve the expected result, which is waiting for the store to sync the expected event before being able to retrieve it.

This is basically the same issue as https://github.com/matrix-org/matrix-rust-sdk/issues/267 – which was partly solved by https://github.com/matrix-org/matrix-rust-sdk/pull/868, and will see some refactoring in https://github.com/matrix-org/matrix-rust-sdk/pull/1022 – but from the generic event sending perspective.

Describe the solution you'd like Some generic way where the SDK API has the guarantee that an event will be available after I created it. So, calling Client::send and after it Client::get_state_event for the event that was sent should do the expected thing, instead of just "failing" (event not being available).

Describe alternatives you've considered The alternative is leaving it up to the consumer to handle those cases, which is already the case. It however is a complex thing to solve, and solving it upstream in the SDK seems like the right thing to do either way.

johannescpk commented 2 years ago

Thought on implementation details: Maybe this could be solved in the store? Basically: .send already creates an entry in the store but marks it as "not synced" and waits for an event with the given event id to arrive to mark it "synced".

FlixCoder commented 2 years ago

Another alternative method could be waiting for syncs to happen in get the methods? But not sure if that ensures that the event then really came in, so I would prefer waiting while creating.

poljar commented 2 years ago

1022 isn't just a refactoring, the original method has quite a big flaw. If you call Client.create_room() inside an event handler you're going to deadlock.

This is quite the usability horror, it would be even worse if we do this to the event sending methods.

Calling room.send(content).await from an event handler, if this is implemented in the same manner as #868, would deadlock as well.

johannescpk commented 2 years ago

It's also an usability horror to not know which state the SDK is in after calling high level APIs.

I wonder whether there's a way to solve this in a sane way.

poljar commented 2 years ago

We decided for #1022 to go with the route of creating an empty room object that only contains a room_id and add guards preventing you to send any messages before you check if the room is encrypted or not.

The other option was to return a MaybeRoom object that can be awaited to fetch the synced room, this would have continued to use the event handler and would again deadlock if called from within an event handler, or if the sync isn't running for that matter.

I think you would have the same two approaches again. Either re-create the event locally, now that you have the event id, or return a MaybeEvent.

I would likely prefer the event re-creation approach, since it avoids the deadlock again, but this doesn't sound that easy to implement.

If someone else has better ideas, I'm all ear.

johannescpk commented 2 years ago

Generally regarding deadlocks: In some cases this doesn't seem avoidable and is something consumers just need to be aware of and know how to handle them. Examples would be locks like Mutex or the Dashmap that the SDK heavily uses. So just because something can be potentially a deadlock, doesn't mean it necessarily should be removed from the API at all costs. Obviously, if it's possible to not introduce possible deadlocks, that should be preferred – but that would need to be weighed up in terms of API convenience as well.

As for ways to solve this API-wise, the possibilities that come to mind:

  1. Dedicated waiting methods as wrapper methods, which could have a deadlock warning in the docs. Would allow calling the non-waiting methods in an event handler
  2. Awaitable results, where potentially deadlocking when waiting for the result will be a docs warning. Would allow calling in an event handler
  3. Automatically waiting when calling the method, with deadlock warning in the docs. This would make it impossible to call the method in an event handler
  4. Returning a stub that isn't tied to the sync result, so no way of knowing whether something actually happened (such as joining a room). This is the #1022 approach, if I understood correctly
  5. Just not having waiting methods in the SDK at all

Personally I'd prefer 2.

poljar commented 2 years ago

Examples would be locks like Mutex or the Dashmap that the SDK heavily uses. So just because something can be potentially a deadlock, doesn't mean it necessarily should be removed from the API at all costs.

Dashmap and Mutex are not exposed to the public API for that very reason, I'm not against having potential deadlocks in the private API.

I am very much against having them in the public API especially if it's in such a common pattern where you try to respond to a message inside an event handler. This is basically what every bot will do, so a guestimated 90% of our users will hit this deadlock.

As for ways to solve this API-wise, the possibilities that come to mind:

  1. is the same as 2, just that 1 increases the API surface massively.
  2. is the same as the MaybeEvent concept mentioned above.
  3. Is a no-go.
  4. Why is there no way of knowing whether something happened? You got the room with the room id, you know you joined and you can send events.
  5. No solution isn't a solution, now is it?

I think #1022 combines 2 and 4. You get a real room object, it doesn't have all the state yet but you can wait for the state to be received using a sync_up() method, or just send an event without knowing the full state.

johannescpk commented 2 years ago

The mentioned APIs were meant as an example that it's rather common in the rust ecosystem to expose potential deadlocks in the public API, since even the std library is doing it. So avoiding potential deadlocks in the public API at all costs just seems unnecessary, given how common it is in the ecosystem.

But still, avoiding it, especially if it's an easily missed footgun would be preferable, of course.

As for 4.: That would be the same as the original API, basically. I get the room id, but to actually work with the room, I'd have to wait for the SDK state to sync up somehow – which is what I actually meant with "knowing whether something happened". A potential await sync_up call would also deadlock in an event handler. So if we want to introduce that kind of functionality, it boils down to "how to educate consumers to not footgut themselves" either way? So why not go with a clean approach from the start – which would be, from my perspective, allowing to "sync up" the returned result. Could even be a dedicated sync_up method on the results with warnings in the docs, I guess.

johannescpk commented 2 years ago

Oh, and

  1. Sometimes no solution is the only solution.

:D

johannescpk commented 2 years ago

@jplatte mentioned

I think it would be safe for us to store the event in the state store after that independently of the sync loop

as a possible solution. Similar to how creating a room creates a room stub, instead of waiting for a sync.

Generally I feel like that the sync should be the source of truth. So I'm not sure it's a good decision to store "this should be right" data after APIs are called. If I understood correctly, the reason the deadlocks are an issue is because event handlers need to fire in a guaranteed order. I'm wondering: Couldn't firing event handlers and "resolving waiting calls" be decoupled? Kind of two event handlers, one for the public API, one for internally resolving the waiting cases? Haven't thought this through fully, but wanted to leave the thought here at least.

johannescpk commented 2 years ago

@poljar you mentioned

I don't think we have a full list. That being said, I can't really come up with anything that won't work besides fetching state events from the store. Not sure if we need to do something about this if we add the sync_up() method.

I'm curious, how do you imagine the sync_up to work in terms of waiting for a specific state event to be synced? It would need an internal list of things to wait for, right?

Btw, just to leave some keywords here: The problem spaces this whole thing is comparable to are "Optimistic UI" and "EventSourcing Projections".

johannescpk commented 2 years ago

Helper method as workaround that waits for a matching state event id coming in via sync:

/// Send the given state event and wait for it to sync with the SDKs
/// internal state store.
/// 
/// **Warning:** Calling this method from a SDK event handler or sending the
/// same event twice will deadlock. The latter could be improved by
/// https://github.com/matrix-org/matrix-rust-sdk/issues/1090.
pub async fn send_state_event<'a, C, K>(
    client: &matrix_sdk::Client,
    room_id: &'a RoomId,
    state_key: &'a K,
    content: &'a C,
    config: Option<RequestConfig>,
) -> Result<send_state_event::v3::Response>
where
    C: StaticEventContent + StateEventContent + RedactContent + Send + Sync + 'static,
    C::Redacted: RedactedStateEventContent<StateKey = C::StateKey> + Send,
    C::Unsigned: Send,
    C::StateKey: Borrow<K> + Send,
    <C::Redacted as StateEventContent>::StateKey: Send,
    <C as RedactContent>::Redacted: RedactedStateEventContent,
    K: AsRef<str> + ?Sized + Send + Sync,
{
    debug!("Sending state event and waiting for matching event id via event handler");

    let (tx, mut rx) = mpsc::channel::<OwnedEventId>(1);

    // Add event handler that sends all incoming state event ids to the receiver
    // below.
    let event_handler_handle = client.add_event_handler(move |event: SyncStateEvent<C>| {
        let tx = tx.clone();
        let event_id = event.event_id().to_owned();
        async move {
            if let Err(error) = tx.send(event_id).await {
                error!("Sending event_id failed: {error}");
            };
        }
    });

    // Send state event.
    let request = send_state_event::v3::Request::new(room_id, state_key, content)?;
    let response = client.send(request, config).await?;

    // Receiver that compares incoming event ids with the event id received
    // from the state event request response.
    loop {
        if rx.recv().await.as_deref() == Some(&response.event_id) {
            debug!("Found corresponding event_id: {}", response.event_id);
            client.remove_event_handler(event_handler_handle);
            break;
        }
    }

    Ok(response)
}
johannescpk commented 2 years ago

Generally I feel like that the sync should be the source of truth. So I'm not sure it's a good decision to store "this should be right" data after APIs are called.

Thinking about this more, and also talking with the Dart & Rust SDK people, it probably makes sense to store data that's confirmed by the homeserver. Basically if the homeserver says "200 OK" to a state event, it's valid to assume the event is valid and store it into the event store client side, maybe as "not yet seen in a sync"-marker. Which also helps with retrying mechanisms without blocking the calling side & even persistence of retrying, since it's then part of the store.