matrix-org / complement-crypto

Apache License 2.0
8 stars 5 forks source link

TestRoomKeyIsCycledAfterEnoughTime flake due to no local/remote echo #114

Open kegsay opened 2 months ago

kegsay commented 2 months ago

Run https://github.com/matrix-org/matrix-rust-sdk/actions/runs/9852865102/job/27202217506

It looks like the sending_task never returns, despite it sending the message.

Relevant logs:

2024-07-09T07:40:13.676048Z  INFO TestRoomKeyIsCycledAfterEnoughTime/{rust_hs1}|{rust_hs1}: [@user-49-alice:hs1](rust) SendMessage !KWuNNCWLHHGZwtRoVU:hs1 => Before the time expires | rust.go:0
...
2024-07-09T07:40:13.676539Z TRACE matrix_sdk::send_queue: received an event to send! txn_id=1e5843d0d07644df946b0ad9c55abebb | crates/matrix-sdk/src/send_queue.rs:436 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1}
...
2024-07-09T07:40:13.676655Z TRACE matrix_sdk::room::futures: Sending encrypted event because the room is encrypted. room_id="!KWuNNCWLHHGZwtRoVU:hs1" | crates/matrix-sdk/src/room/futures.rs:194 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1} > send_raw{room_id="!KWuNNCWLHHGZwtRoVU:hs1" transaction_id="1e5843d0d07644df946b0ad9c55abebb" is_room_encrypted=true}
...
2024-07-09T07:40:13.677198Z DEBUG matrix_sdk_crypto::session_manager::sessions: Collected user/device pairs that are missing an Olm session missing_session_devices_by_user={} timed_out_devices_by_user={} | crates/matrix-sdk-crypto/src/session_manager/sessions.rs:301 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1} > send_raw{room_id="!KWuNNCWLHHGZwtRoVU:hs1" transaction_id="1e5843d0d07644df946b0ad9c55abebb" is_room_encrypted=true} > preshare_room_key{room_id="!KWuNNCWLHHGZwtRoVU:hs1" } > get_missing_sessions
...
2024-07-09T07:40:13.678445Z DEBUG matrix_sdk::http_client::native: Sending request num_attempt=1 | crates/matrix-sdk/src/http_client/native.rs:55 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1} > send_raw{room_id="!KWuNNCWLHHGZwtRoVU:hs1" transaction_id="1e5843d0d07644df946b0ad9c55abebb" is_room_encrypted=true} > send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6] config=RequestConfig { timeout: 30s, retry_limit: 3 } request_id="REQ-21" method=PUT uri="http://127.0.0.1:32781/_matrix/client/v3/rooms/!KWuNNCWLHHGZwtRoVU:hs1/send/m.room.encrypted/1e5843d0d07644df946b0ad9c55abebb" request_size="577 B"}
...
2024-07-09T07:40:13.699839Z DEBUG matrix_sdk::http_client: Got response | crates/matrix-sdk/src/http_client/mod.rs:228 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1} > send_raw{room_id="!KWuNNCWLHHGZwtRoVU:hs1" transaction_id="1e5843d0d07644df946b0ad9c55abebb" is_room_encrypted=true} > send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6] config=RequestConfig { timeout: 30s, retry_limit: 3 } request_id="REQ-21" method=PUT uri="http://127.0.0.1:32781/_matrix/client/v3/rooms/!KWuNNCWLHHGZwtRoVU:hs1/send/m.room.encrypted/1e5843d0d07644df946b0ad9c55abebb" request_size="577 B" status=200 response_size="59 B"}
...
2024-07-09T07:40:13.699942Z  INFO matrix_sdk::room::futures: Sent event in room | crates/matrix-sdk/src/room/futures.rs:236 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1} > send_raw{room_id="!KWuNNCWLHHGZwtRoVU:hs1" transaction_id="1e5843d0d07644df946b0ad9c55abebb" is_room_encrypted=true event_id="$WjfmfvepH3dxgepiHbRJQ_WfTMrFAVcEuslWDB_9lrs"}
2024-07-09T07:40:13.699991Z TRACE matrix_sdk::send_queue: successfully sent txn_id=1e5843d0d07644df946b0ad9c55abebb event_id=$WjfmfvepH3dxgepiHbRJQ_WfTMrFAVcEuslWDB_9lrs | crates/matrix-sdk/src/send_queue.rs:454 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1}
2024-07-09T07:40:13.700070Z TRACE matrix_sdk::send_queue: queue is empty, sleeping | crates/matrix-sdk/src/send_queue.rs:424 | spans: sending_task{room_id=!KWuNNCWLHHGZwtRoVU:hs1}

Despite this:

     room_keys_test.go:214: [@user-50-bob:hs1](rust) WaitUntilEventInRoom !KWuNNCWLHHGZwtRoVU:hs1
      room_keys_test.go:215: [@user-49-alice:hs1](rust) SendMessage !KWuNNCWLHHGZwtRoVU:hs1 => Before the time expires
      rust.go:692: [@user-50-bob:hs1]AddTimelineListener[!KWuNNCWLHHGZwtRoVU:hs1] TimelineDiff len=2
      rust.go:788: [@user-50-bob:hs1]TimelineDiff change: &{ID:$nOnJ6G2db0lEjeJVee7VWJN7goLxIFNZq49sDBySYbM Text:Before we start Sender:@user-49-alice:hs1 Target: Membership: FailedToDecrypt:false}
      rust.go:788: [@user-50-bob:hs1]TimelineDiff change: &{ID:$WjfmfvepH3dxgepiHbRJQ_WfTMrFAVcEuslWDB_9lrs Text:Before the time expires Sender:@user-49-alice:hs1 Target: Membership: FailedToDecrypt:false}
      room_keys_test.go:215: SendMessage(rust) @user-49-alice:hs1: timed out after 11s
kegsay commented 2 months ago

It's not the sending task. It's the fact that we don't see the remote echo for this event. We don't even see the local echo for this event actually...

kegsay commented 2 months ago

The local echo handler worked for the previous message sent by Alice:

2024-07-09T07:40:13.530040Z  INFO TestRoomKeyIsCycledAfterEnoughTime/{rust_hs1}|{rust_hs1}: [@user-49-alice:hs1](rust) SendMessage !KWuNNCWLHHGZwtRoVU:hs1 => Before we start | rust.go:0
...
2024-07-09T07:40:13.531892Z  INFO matrix_sdk_ui::timeline::builder: spawned the local echo handler! | crates/matrix-sdk-ui/src/timeline/builder.rs:286 | spans: local_echo_handler{room_id="!KWuNNCWLHHGZwtRoVU:hs1"}
...
2024-07-09T07:40:13.533799Z DEBUG matrix_sdk_ui::timeline::event_handler: Handling local event | crates/matrix-sdk-ui/src/timeline/event_handler.rs:301 | spans: local_echo_handler{room_id="!KWuNNCWLHHGZwtRoVU:hs1"} > handle_local_event > handle_local_event > handle_event{txn_id="85ace0bf7c2d454f9095200a9528ccc0"}
2024-07-09T07:40:13.533775Z DEBUG matrix_sdk_crypto::olm::account: The fallback key either expired or we didn't have one: generated a new fallback key. removed_fallback_key=None | crates/matrix-sdk-crypto/src/olm/account.rs:598 | spans: next_sync_with_lock > sync_once > handle_response > preprocess_to_device_events > receive_sync_changes
2024-07-09T07:40:13.533912Z DEBUG log: processing CharacterTokens(NotSplit, Tendril<UTF8>(shared: \"Before we start\")) in insertion mode InBody     | spans: local_echo_handler{room_id="!KWuNNCWLHHGZwtRoVU:hs1"} > handle_local_event > handle_local_event > handle_event{txn_id="85ace0bf7c2d454f9095200a9528ccc0"}
2024-07-09T07:40:13.533969Z DEBUG log: processing EOF in state Data     | spans: local_echo_handler{room_id="!KWuNNCWLHHGZwtRoVU:hs1"} > handle_local_event > handle_local_event > handle_event{txn_id="85ace0bf7c2d454f9095200a9528ccc0"}
2024-07-09T07:40:13.534005Z DEBUG log: processing EOFToken in insertion mode InBody     | spans: local_echo_handler{room_id="!KWuNNCWLHHGZwtRoVU:hs1"} > handle_local_event > handle_local_event > handle_event{txn_id="85ace0bf7c2d454f9095200a9528ccc0"}
2024-07-09T07:40:13.534056Z TRACE matrix_sdk_ui::timeline::event_handler: Adding new local timeline item | crates/matrix-sdk-ui/src/timeline/event_handler.rs:937 | spans: local_echo_handler{room_id="!KWuNNCWLHHGZwtRoVU:hs1"} > handle_local_event > handle_local_event > handle_event{txn_id="85ace0bf7c2d454f9095200a9528ccc0"}

There's literally 0 log lines for "local" for the event that wasn't in the timeline.