googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
5.91k stars 779 forks source link

In-place Agones Upgrades: Storage Compatibility #3771

Open zmerlynn opened 2 months ago

zmerlynn commented 2 months ago

[!NOTE] Milestone of #3766, which we are seeking feedback on. We will move forward with pieces that seem non-contentious, though.

Storage Compatibility

Defaulting

Right now, Agones defaults values in the webhook, e.g. GameServer defaulting is the only real thing the GameServer mutation webhook does. Defaulting serves a couple of purposes:

The problem is that defaulting in the webhook alone is not safe across configuration changes. Example using eviction.safe:

The reason for the failure is that the hook blindly assumes the value was defaulted by the webhook, but the webhook never had a chance to run. Note that for this particular case, the gap here is very narrow in time - in particular t1 and t2 need to occur such that defaulting of the GameServer occurs on a 1.29 agones-extensions container, but a 1.30 agones-controller container attempts to create the Pod. That said, this condition could easily occur during rollout of 1.30, though, and cause a multi-second hiccup.

To solve this problem, I propose we:

Unknown or Disabled Fields

A similar problem exists for API fields that should not be present but have already been set, which typically only occurs if a feature gate has been disabled (either due to downgrade or explicit disablement). This takes a couple of forms:

In either of these cases, we need to ensure that on "first touch", the controller drops the unknown fields, rather than preserving them. In general, this is a safer handling of latent unknown fields - otherwise when the feature gate is reenabled, a preserved field could surprise the user. (Note that the first case, where the field is not present at all in the CRD, is generally covered by field pruning, so mostly this is figuring out logic for the latter case.)

Update vs Patch for controllers

Note that controllers have a similar problem to the SDK of using Update vs Patch, mentioned here - different controller versions may drop fields. However:

markmandel commented 2 months ago

Apply Default

I like the approach. But you would want to ApplyDefaults() here: https://github.com/googleforgames/agones/blob/4bb673186b2ee8431de00c5c564bd1daa1a356df/pkg/gameservers/controller.go#L389-L396

Rather than where you specified, as enqueing only enque's the namespace/name of the object, not the object itself - giving the best opportunity to get the latest Object at the time of syncronisation.

Everything else.

LGTM! I like the approach 👍🏻

zmerlynn commented 2 months ago

Apply Default

I like the approach. But you would want to ApplyDefaults() here:

You're right. It might be nice if we had a helper function here that's basically "Get and Default" (for the case of inline changes like you just did for the migration controller), but agreed on the placement.