googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6k stars 793 forks source link

Incorrect error logging caused by ResourceVersion conflict on SDK Patch call #3967

Closed daniellee closed 1 day ago

daniellee commented 2 weeks ago

What happened: After upgrading to 1.41, we started getting a couple of new error messages in the logs for the Agones sidecar:

{"message":"the server rejected our request due to an error in our request","severity":"error","time":"2024-08-27T05:11:38.171542173Z"} 

and:

{"error":"the server rejected our request due to an error in our request","gsKey":"updateLabel","message":"","queue":"agones.dev.gameservers.xxx","severity":"error","source":"*sdkserver.SDKServer","time":"2024-08-27T00:33:16.155807016Z"}

and:

{"message":"could not update GameServer gameservers/xxx to state Shutdown: the server rejected our request due to an error in our request","severity":"error","stack":["agones.dev/agones/pkg/sdkserver.(*SDKServer).updateState\n\t/go/src/agones.dev/agones/pkg/sdkserver/sdkserver.go:412","agones.dev/agones/pkg/sdkserver.(*SDKServer).syncGameServer\n\t/go/src/agones.dev/agones/pkg/sdkserver/sdkserver.go:331","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:182","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:158","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run.func1\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:217","k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226","k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227","k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204","k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:217","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"],"time":"2024-08-26T10:06:47.046294688Z"} 

It is due to the change in #3803 that adds a test Json Patch operation that checks the ResourceVersion and rejects patch updates if there is a conflict. If two calls are made to update a field in the gameserver spec (like setting a label/annotation or changing the state) in quick succession then a conflict can happen.

After the failed patch update, it gets retried and then succeeds - so I don't think this is actually an error and is just how Kubernetes works.

What you expected to happen:

For this error message to not be an error or to be debug logging.

How to reproduce it (as minimally and precisely as possible):

The error message is from Kubernetes and can be recreated with kubectl by setting an older/smaller value for the /metdata/resourceVersion field for a running gameserver:

kubectl patch -v=8 gameserver xxx-aaa-bbb --type json -p '[{"op":"test","path":"/metadata/resourceVersion","value":"37909512"},{"op":"replace","path":"/status/state","value":"RequestReady"}]'

To recreate with Agones, then two clients setting a label or annotation on the same gameserver at the same time will generate the error.

Anything else we need to know?:

Environment:

markmandel commented 2 weeks ago

I know why this is -- the conflict doesn't reigster under IsConflict . Should be easy enough to fix, I'm just trying to work out how to force a conflict in the SDK Sidecar 🤦🏻 😁 so I can at least see it in action.

markmandel commented 2 weeks ago

Got it! Fix coming shortly.