Closed jkowalski closed 5 years ago
The biggest issue here is - how to delete safely without accidentally deleting something that is being concurrently allocated.
Or "only delete this object if it's equal to this revision/generation"
Here's a possibly crazy idea - we could reject the deletion with a webhook! The webhook could check if the object is allocated
(pretty sure this is 100% up to date), and if it is, reject the deletion.
I think this will work!
Edit: removed idea, as this below is better.
This can be done by adding something like "Deleting" status and using regular Update:
Instead of deleting, callers would set "Deleting" status (via regular Update which resolves race conditions) and GS controller would be the only one to trigger actual deletion for GS in Deleting state.
BTW. As soon as we remove the mutex, we will see increased contention on Game Server resources. That can be fixed with batching and randomization of GS to allocate (see #536).
This can be done by adding something like "Deleting" status and using regular Update:
You know what - we already have this functionality, because of the SDK. It's called Shutdown
state.
https://github.com/GoogleCloudPlatform/agones/blob/master/pkg/gameservers/controller.go#L619-L636
Good call. I like it :+1:
I think I can work on this.
@jkowalski @ilkercelikyilmaz assignment made!
PR #572 fixes this partially. I will continue on the allocation improvements so we can get rid-off the Allocation Mutex
Currently there's a global mutex shared between 4 controllers that prevents allocations from happening when there's a deletion of (any) game server going on, regardless of a fleet it's on, etc.
To get decent allocation throughput this mutex should be removed and we should start relying on conditional mutations of GameServer itself to ensure correctness.