zamazan4ik / aws-gamelift-server-sdk-rs

MIT License
5 stars 1 forks source link

Deadlock when calling Api methods from callbacks #5

Closed Luminoth closed 2 years ago

Luminoth commented 2 years ago

From the basic.rs:

            on_start_game_session: Box::new(move |game_session| {
                Box::pin(async move {
                    log::debug!("{:?}", game_session);

                    CLIENT
                        .lock()
                        .await
                        .activate_game_session()
                        .await
                        .expect("Cannot activate game session");

                    log::info!("session activated!");
                })
            }),

the call to activate_game_session() does this:

let game_session_id = self.inner.lock().await.game_session_id.clone();

that inner lock is already being held here in perform_connect():

                        ReceivedMessageType::ActivateGameSession(message) => {
                            log::info!("Received ActivateGameSession event");
                            callback_handler
                                .lock()
                                .await
                                .on_start_game_session(message.game_session)
                                .await;
                        }
Luminoth commented 2 years ago

I've managed to work around this for now by spawning tasks for the API calls so they happen outside of the outer lock but unfortunately haven't been able to figure out a better actual solution. I suspect the one lock might actually need to be 2 locks but I'm not sure the underlying reason for sharing it so I can't say that with high confidence. I was thinking for a minute that it might have been caused by my async PR but I don't see how it ever worked before that unless the Tokio Mutex is recursive, and I don't think that it is, or at least it's not documented that it is from what I can find.

zamazan4ik commented 2 years ago

Thanks for the report!

That's really strange - I don't know, how it worked before. Maybe, during the tests, I had another test app configuration (like you) and didn't have a deadlock.

I suspect the one lock might actually need to be 2 locks but I'm not sure the underlying reason for sharing it so I can't say that with high confidence.

I guess I've chosen one mutex simply because it was easier to do. Synchronizing all stuff under one mutex is much easier to do, you know.

As a simple solution, I think we can use recursive Mutex. Not sure does Tokio provides it or not. Maybe parking_lot has some relevant primitives. As a right solution (at least I think so for now), we need to think about separating into two mutex approach at least because it will reduce mutex contention (but really I don't care about contention at all here since it's not supposed to handle large amount of events).

@Luminoth Can you provide please a PR with a recursive mutex? I will release it as a hotfix, so you will be able to update your library to upstream.

And once again - thanks for your help with the library :) Unfortunately, currently I cannot integrate my app to GameLift so I didn't test this library well.

zamazan4ik commented 2 years ago

@Luminoth can we close the issue?

Luminoth commented 2 years ago

Yep! I appreciate you taking that PR.