matrix-org / matrix-rust-sdk

Matrix Client-Server SDK for Rust
Apache License 2.0
1.27k stars 253 forks source link

fix(oidc): make sure we keep track of an ongoing OIDC refresh up to the end #4304

Closed bnjbvr closed 1 day ago

bnjbvr commented 1 day ago

There's a lock making sure we're not doing multiple refreshes of an OIDC token at the same time. Unfortunately, this lock could be dropped, if the task spawned by the inner function was detached.

The lock must be held throughout the entire detached task's lifetime, which this refactoring ensures, by setting the lock's result after calling the inner function.

codecov[bot] commented 1 day ago

Codecov Report

Attention: Patch coverage is 78.72340% with 10 lines in your changes missing coverage. Please review.

Project coverage is 85.08%. Comparing base (bc70f3c) to head (aa43b2c). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/matrix-sdk/src/oidc/mod.rs 78.26% 10 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #4304 +/- ## ========================================== - Coverage 85.09% 85.08% -0.02% ========================================== Files 274 274 Lines 30191 30190 -1 ========================================== - Hits 25691 25686 -5 - Misses 4500 4504 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

poljar commented 1 day ago

Hmm I don't understand this:

if the task spawned by the inner function was detached.

When or how can the inner function detach the spawned task? It doesn't do so as far as I can tell, it awaits the join of the task.

What am I missing?

bnjbvr commented 1 day ago

Hmm I don't understand this:

if the task spawned by the inner function was detached.

When or how can the inner function detach the spawned task? It doesn't do so as far as I can tell, it awaits the join of the task.

What am I missing?

Oh sorry: there was a spawn() call in the inner function, which now lives in the parent. The spawn() existed so that the token refresh kept on, despite the caller cancelling the network request (since this is all called when spawning a network request to the homeserver). Only in that case (caller cancelling) is the future detached.

poljar commented 1 day ago

You mean, cancelling the parent doesn't cancel the spawned task in the child?

So now you moved the spawn into the parent and this now cancels the spawned task as well?

Or is the whole child/parent interaction unimportant and the reason this now works as expected is that the lock guard has been moved into the spawned task?

bnjbvr commented 1 day ago

The spawned task is never cancelled: using async fn lol() { spawn(fut).await } makes sure that fut is executed, even when the call to lol() has been aborted (i.e. it's been spawned and cancelled). This can happen often, especially in an invisible manner over the FFI boundary.

the reason this now works as expected is that the lock guard has been moved into the spawned task?

Exactly: the spawned task now keeps the mutex guard (as an owned mutex) over the entire lifetime of the spawned task.

poljar commented 1 day ago

Exactly: the spawned task now keeps the mutex guard (as an owned mutex) over the entire lifetime of the spawned task.

Alright, final question then. What does the moving of the spawn from the child to the parent then achieve? Couldn't we just have moved the owned lock guard to be an argument of the child method?

bnjbvr commented 1 day ago

Makes the code simpler: there's no spawn hidden in the child inner function, it's the parent who spawns it now. Since it only involved moving some other code that can fail early upwards, that seemed simpler. (Also, moving the guard down would have meant that every ? / early return in the spawned task would now need to set the guard's result as well, making it needlessly verbose)

bnjbvr commented 1 day ago

A regression test would have been nice as well but is probably not easy to write.

Indeed, this requires super perfect control over the timing of responses / API calls. Likely doable, but agreed the value is :shrug: compared to the code changes.