nextcloud / spreed

🗨️ Nextcloud Talk – chat, video & audio calls for Nextcloud
https://nextcloud.com/talk
GNU Affero General Public License v3.0
1.64k stars 438 forks source link

Random 404 when deleting room with session #5733

Closed PVince81 closed 3 years ago

PVince81 commented 3 years ago

Steps to reproduce

  1. Create a room
  2. Join the room
  3. In the left sidebar delete the room
  4. Be unlucky :four_leaf_clover:

Expected behaviour

Redirect to the main page, not "not-found" page and the room is deleted.

Actual behaviour

Sometimes, the room is not deleted and the console returns a 404 error for the DELETE operation.

Talk app

Talk app version: 11.2.1

Custom Signaling server configured: yes

Browser log

Server configuration

Nextcloud Version: 21.0.2

Server log (data/nextcloud.log)

Nothing relevant.

Analysis

This was reported to me internally, I was not able to reproduce this locally.

I've dug into the code and it looks like there are two requests happening in parallel when deleting a conversation: 1) switching to the default route triggers a DELETE to /room/$token/participant/active leading to https://github.com/nextcloud/spreed/blob/v11.2.1/lib/Controller/RoomController.php#L1659 2) deleteConversation call triggers a DELETE to /room/$token

The code does not wait for 1) to finish and directly triggers 2).

After adding more extensive logging on the target env where the problem occurs, I noticed that the 404 for the deletion actually does find the room, but fails to find the participant here: https://github.com/nextcloud/spreed/blob/v11.2.1/lib/Middleware/InjectionMiddleware.php#L156

It looks like depending whether the session is still there, which correlated with request 1), it might enter the if or else statement, using a different approach for finding the participant. A race condition could cause the session id to not exist any more in the database while it still existed at the time of retrieval from the PHP session.

I did more local tests with logging in both code paths and noticed that the one or the other branch was randomly picked, which confirms that there is some race condition in place. However in my local setup it never resulted in a 404.

From my understanding, it should never happen that the session id can be deleted in one PHP request while the other one is still reading it. Unless maybe the PHP session handling was configured differently on the target system...

Solutions

Approach 1: wait for request 1) to finish

Currently not possible because of the way the routing + onRouteChange events are done, these are not async/await-compatible on the event bus, so we can't wait for the "leaveConversation" to finish.

Approach 2: discard request 1)

I've tried this locally and it results in the signaling server telling the frontend that the session is gone, so it redirects to "not-found" instead of the root page. Not too bad, could be a way.

Approach 3: wait for request 2) and then run request 1)

Doesn't make much sense as it results in the same result like approach 2 depending on the speed at which the request 1) is sent.

Approach 4: add fallback in the backend

Modify https://github.com/nextcloud/spreed/blob/v11.2.1/lib/Middleware/InjectionMiddleware.php#L156 and whenever the participant cannot be found with session id, fall back to getting it without the session id then.

Drawbacks: since this is a common code path used by all controller methods, it could cause side effects.

Also I'm not sure if this is a good idea at all or whether it breaks semantically in some places.

PVince81 commented 3 years ago

let's go with approach 2

as far as I can see, even when choosing "Leave conversation" we don't touch the URL nor send "DELETE .../participant/active" and the signaling server tells the frontend that the room does not exist any more

so doing the same approach for "Delete conversation" would align the behavior with that

PVince81 commented 3 years ago

PR here: https://github.com/nextcloud/spreed/pull/5736

PVince81 commented 3 years ago

as discussed with @nickvergessen we'll also go with approach 4, PR here: https://github.com/nextcloud/spreed/pull/5738