Closed pgilfillan closed 1 year ago
Curious - rather than create an individual GameServer, why don't you increment your fleet size by one?
There is no fleet- this is mostly for the development/testing stages where a lot of different people are using a lot of different build versions, and it would be prohibitively expensive (somewhat related to https://github.com/googleforgames/agones/issues/1782) to have fleets for all of them. An individual server is only started if there's no fleet.
There is no fleet- this is mostly for the development/testing stages where a lot of different people are using a lot of different build versions, and it would be prohibitively expensive (somewhat related to #1782) to have fleets for all of them. An individual server is only started if there's no fleet.
🤔 if you can create a GameServer and then delete it, you can create a Fleet with 1 instance in it, no?
Yeah we could but then that fleet wouldn't be deleted unless we had some backend cronjob/background process that removed it after some amount of time, and if we had that we could just use it on standalone GameServer's anyway. Basically it would remove the ability for a server version to shut itself down after playing the one match it was requested for.
Closing this, as it's duplicate of #2966
What happened: We have a system for starting individual GameServer resources for a match if there is no server from a Fleet available. Occasionally (not every time) these servers get stuck in an Unhealthy state and don't get removed- according to https://agones.dev/site/docs/guides/health-checking/#fleet-management-of-unhealthy-gameservers not getting removed makes sense ("If a GameServer moves into an Unhealthy state when it is not part of a Fleet, the GameServer will remain in the Unhealthy state until explicitly deleted") but the servers becoming Unhealthy to begin with is a problem. It's before either the main or sidecar container is ready:
This is the health config of the GameServer
so it should have taken 5 + 8 * 5 = 45 seconds before it goes to unhealthy but there wasn't enough time for that after it was scheduled.
Interestingly even though the sidecar is ContainerCreating,
kubectl describe pod
events says that it had started (this was for a different server to above but same problem).kubectl describe gs
event output:I think the issue is usually seen when new nodes are spun up so maybe the sidecar container is started in the background (it just doesn't say it's ready and no logs are available) and then the health checks fail because the main server image is still pulling (no cached image on the new node)? It can take a while to pull because of the size.
Another problem is that the server container starts sending health checks properly but remains Unhealthy. It will then shut itself down after a few minutes because it hasn't received a match (logic within the server code), but this fails because it can't change state ("GameServerState already unhealthy. Skipping update").
What you expected to happen: Three things:
kubectl get pods
and logs are available) as soon as it's started, not just when the server container is started (assuming that's why it's stuck on ContainerCreating).initialDelaySeconds
to something like 5 minutes just in case the image takes a while to be pulled. Maybe split up theinitialDelaySeconds
config somehow to account for pre container start time?How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: Top of the logs from the sidecar, after it had started
Environment:
kubectl version
): 1.21