googleforgames / agones

Dedicated Game Server Hosting and Scaling for Multiplayer Games on Kubernetes
https://agones.dev
Apache License 2.0
6.09k stars 810 forks source link

Flaky SDK Conformance Tests #1779

Closed markmandel closed 4 years ago

markmandel commented 4 years ago

What happened:

For a long time the SDK Conformance tests will occasionally flake.

What you expected to happen:

Them to pass all the time.

How to reproduce it (as minimally and precisely as possible):

That's the problem - they are flaky :smile:

This bug is a place to track the flakiness responses, and see if we can see a common pattern.

Anything else we need to know?:

Environment:

Context: https://console.cloud.google.com/cloud-build/builds/b61c074d-d6cd-4f68-b289-74a2d0f7ad2b?project=258182270954

/usr/bin/make run-sdk-conformance-no-build SDK_FOLDER=go GRPC_PORT=9001 HTTP_PORT=9101 FEATURE_GATES=PlayerTracking=true TESTS=ready,allocate,setlabel,setannotation,gameserver,health,shutdown,watch,reserve,getplayercapacity,setplayercapacity,playerconnect,playerdisconnect,getplayercount,isplayerconnected,getconnectedplayers
/usr/bin/make ensure-image IMAGE_TAG=agones-build-sdk-node:2ca282f4dc BUILD_TARGET=build-build-sdk-image SDK_FOLDER=node
/usr/bin/make ensure-image IMAGE_TAG=gcr.io/agones-images/agones-sdk:1.9.0-b831bb7 BUILD_TARGET=build-agones-sdk-server-image
/usr/bin/make ensure-image IMAGE_TAG=agones-build-sdk-go:6a1f217343 BUILD_TARGET=build-build-sdk-image SDK_FOLDER=go
+ /root/sdktest.sh
+ cd /go/src/agones.dev/agones/test/sdk/nodejs
+ npm run testSDK
/usr/bin/make ensure-image IMAGE_TAG=gcr.io/agones-images/agones-sdk:1.9.0-b831bb7 BUILD_TARGET=build-agones-sdk-server-image

> @ testSDK /go/src/agones.dev/agones/test/sdk/nodejs
> node ./testSDKClient.js

attempting to connect
connected!
{"message":"Ready request has been received!","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.057025478Z"}
{"message":"Connected to watch GameServer...","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.057141103Z"}
{"message":"Gameserver update received","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.057354483Z"}
{"message":"Getting GameServer details","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.063013179Z"}
watch {
  objectMeta: {
    name: 'local',
    namespace: 'default',
    uid: '5229266051501086426',
    resourceVersion: 'v1',
    generation: 1,
    creationTimestamp: 1598956249,
    deletionTimestamp: 0,
    annotationsMap: [ [Array] ],
    labelsMap: [ [Array] ]
  },
  spec: {
    health: {
      disabled: false,
      periodSeconds: 3,
      failureThreshold: 5,
      initialDelaySeconds: 10
    }
  },
  status: {
    state: 'Ready',
    address: '127.0.0.1',
    portsList: [ [Object] ],
    players: undefined
  }
}
Setting annotation  5229266051501086426
{"message":"Setting annotation","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.069942988Z","values":{"key":"annotation","value":"5229266051501086426"}}
{"message":"Gameserver update received","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.070179458Z"}
{"message":"Setting label","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.07306106Z","values":{"key":"label","value":"1598956249"}}
{"message":"Gameserver update received","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.07320512Z"}
watch {
  objectMeta: {
    name: 'local',
    namespace: 'default',
    uid: '5229266051501086426',
    resourceVersion: 'v1',
    generation: 1,
    creationTimestamp: 1598956249,
    deletionTimestamp: 0,
    annotationsMap: [ [Array], [Array] ],
    labelsMap: [ [Array] ]
  },
  spec: {
    health: {
      disabled: false,
      periodSeconds: 3,
      failureThreshold: 5,
      initialDelaySeconds: 10
    }
  },
  status: {
    state: 'Ready',
    address: '127.0.0.1',
    portsList: [ [Object] ],
    players: undefined
  }
}
gameServer {
  objectMeta: {
    name: 'local',
    namespace: 'default',
    uid: '5229266051501086426',
    resourceVersion: 'v1',
    generation: 1,
    creationTimestamp: 1598956249,
    deletionTimestamp: 0,
    annotationsMap: [ [Array] ],
    labelsMap: [ [Array] ]
  },
  spec: {
    health: {
      disabled: false,
      periodSeconds: 3,
      failureThreshold: 5,
      initialDelaySeconds: 10
    }
  },
  status: {
    state: 'Ready',
    address: '127.0.0.1',
    portsList: [ [Object] ],
    players: undefined
  }
}
creation Timestamp 1598956249
health undefined
watch {
  objectMeta: {
    name: 'local',
    namespace: 'default',
    uid: '5229266051501086426',
    resourceVersion: 'v1',
    generation: 1,
    creationTimestamp: 1598956249,
    deletionTimestamp: 0,
    annotationsMap: [ [Array], [Array] ],
    labelsMap: [ [Array], [Array] ]
  },
  spec: {
    health: {
      disabled: false,
      periodSeconds: 3,
      failureThreshold: 5,
      initialDelaySeconds: 10
    }
  },
  status: {
    state: 'Ready',
    address: '127.0.0.1',
    portsList: [ [Object] ],
    players: undefined
  }
}
{"duration":{"seconds":5},"message":"Reserve request has been received!","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.07671884Z"}
{"message":"Health Ping Received!","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.076763512Z"}
{"message":"Gameserver update received","sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.07695493Z"}
watch {
  objectMeta: {
    name: 'local',
    namespace: 'default',
    uid: '5229266051501086426',
    resourceVersion: 'v1',
    generation: 1,
    creationTimestamp: 1598956249,
    deletionTimestamp: 0,
    annotationsMap: [ [Array], [Array] ],
    labelsMap: [ [Array], [Array] ]
  },
  spec: {
    health: {
      disabled: false,
      periodSeconds: 3,
      failureThreshold: 5,
      initialDelaySeconds: 10
    }
  },
  status: {
    state: 'Reserved',
    address: '127.0.0.1',
    portsList: [ [Object] ],
    players: undefined
  }
}
{"message":"shutting down sdk server","severity":"info","source":"main","time":"2020-09-01T10:31:19.455406386Z"}
{"message":"Could not find a request which was expected","request":"allocate","sdkName":"node","severity":"error","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.45549336Z"}
{"expected":["ready","allocate","setlabel","setannotation","gameserver","health","shutdown","watch","reserve"],"message":"Testing Failed","received":["ready","watch","gameserver","setannotation","setlabel","health","reserve"],"sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-01T10:31:19.455509623Z"}
/go/src/agones.dev/agones/sdks/nodejs/src/agonesSDK.js:130
            throw error;
            ^

Error: 14 UNAVAILABLE: Connection dropped
    at Object.callErrorFromStatus (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/client.js:327:49)
    at Object.onReceiveStatus (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:305:181)
    at Http2CallStream.outputStatus (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/call-stream.js:117:74)
    at Http2CallStream.maybeOutputStatus (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/call-stream.js:156:22)
    at Http2CallStream.endCall (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/call-stream.js:142:18)
    at Http2CallStream.disconnectListener (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/call-stream.js:104:18)
    at Socket.<anonymous> (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/subchannel.js:371:25)
    at Object.onceWrapper (events.js:422:26)
    at Socket.emit (events.js:327:22) {
  code: 14,
  details: 'Connection dropped',
  metadata: Metadata { internalRepr: Map {}, options: {} }
}
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! @ testSDK: `node ./testSDKClient.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the @ testSDK script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2020-09-01T10_31_19_506Z-debug.log
make[2]: *** [run-sdk-conformance-no-build] Error 1
includes/sdk.mk:145: recipe for target 'run-sdk-conformance-no-build' failed
make[1]: *** [run-sdk-conformance-test] Error 2
includes/sdk.mk:152: recipe for target 'run-sdk-conformance-test' failed
make: *** [run-sdk-conformance-test-node] Error 2
make: *** Waiting for unfinished jobs....
includes/sdk.mk:159: recipe for target 'run-sdk-conformance-test-node' failed
includes/sdk.mk:88: recipe for target 'run-sdk-command' failed
make[3]: *** [run-sdk-command] Error 1
markmandel commented 4 years ago

https://console.cloud.google.com/cloud-build/builds/68a4c348-6b8f-40f8-ad6a-eaadb9d85c1d?project=258182270954

make[3]: Leaving directory '/workspace/build' /usr/bin/make run-sdk-command COMMAND=build-sdk-test Error: Failed to connect before the deadline at checkState (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/client.js:70:26) at Timeout._onTimeout (/go/src/agones.dev/agones/sdks/nodejs/node_modules/@grpc/grpc-js/build/src/channel.js:297:17) at listOnTimeout (internal/timers.js:549:17) at processTimers (internal/timers.js:492:7)

markmandel commented 4 years ago

https://console.cloud.google.com/cloud-build/builds/e2f2d946-e3fc-4a3b-8fb6-fd1b994a8ab7?project=258182270954

{"message":"shutting down sdk server","severity":"info","source":"main","time":"2020-09-08T21:43:45.966680175Z"}
{"message":"Could not find a request which was expected","request":"ready","sdkName":"go","severity":"error","source":"*sdkserver.LocalSDKServer","time":"2020-09-08T21:43:45.966802274Z"}
{"expected":["ready","allocate","setlabel","setannotation","gameserver","health","shutdown","watch","reserve","getplayercapacity","setplayercapacity","playerconnect","playerdisconnect","getplayercount","isplayerconnected","getconnectedplayers"],"message":"Testing Failed","received":[],"sdkName":"go","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-08T21:43:45.966838567Z"}
make[1]: *** [run-sdk-conformance-no-build] Error 1
includes/sdk.mk:145: recipe for target 'run-sdk-conformance-no-build' failed
make: *** [run-sdk-conformance-test-go] Error 2
make: *** Waiting for unfinished jobs....
includes/sdk.mk:162: recipe for target 'run-sdk-conformance-test-go' failed
aLekSer commented 4 years ago

https://console.cloud.google.com/cloud-build/builds/3884cc38-5a5b-4a59-adaf-0b9ef7fcf736;step=23?project=agones-images This one is different and I see that make runs was out of sync at all SDK server listener already has closed when node ./testSDKClient.js was in attempting to connect stage.

docker run -p 9002:9002 -p 9102:9102 -e "FEATURE_GATES=" -e "ADDRESS=" -e "TEST=ready,allocate,setlabel,setannotation,gameserver,health,shutdown,watch,reserve" -e "SDK_NAME=node" -e "TIMEOUT=30" -e "DELAY=1" \
--net=host gcr.io/agones-images/agones-sdk:1.9.0-3317115 --grpc-port 9002 --http-port 9102
WARNING: Published ports are discarded when using host network mode
{"ctlConf":{"Address":"","IsLocal":false,"LocalFile":"","Delay":1,"Timeout":30,"Test":"ready,allocate,setlabel,setannotation,gameserver,health,shutdown,watch,reserve","TestSdkName":"node","GRPCPort":9002,"HTTPPort":9102},"featureGates":"ContainerPortAllocation=true\u0026Example=true\u0026PlayerTracking=false\u0026SDKWatchSendOnExecute=false","message":"Starting sdk sidecar","severity":"info","source":"main","time":"2020-09-09T14:59:37.647849379Z","version":"1.9.0-3317115"}
{"message":"Waiting 1 seconds before starting","severity":"info","source":"main","time":"2020-09-09T14:59:37.648074846Z"}
{"grpcEndpoint":":9002","message":"Starting SDKServer grpc service...","severity":"info","source":"main","time":"2020-09-09T14:59:38.649733702Z"}
{"httpEndpoint":":9102","message":"Starting SDKServer grpc-gateway...","severity":"info","source":"main","time":"2020-09-09T14:59:39.650910324Z"}
cd build-sdk-images; \
if [ "node" != "tool" ] && [ -f node/sdktest.sh ] ; then \
    cd - ; \
    /usr/bin/make ensure-build-sdk-image SDK_FOLDER=node ; \
    docker run --rm -v /workspace/build//.config/gcloud:/root/.config/gcloud -v ~/.kube/:/root/.kube -v ~/.config/helm:/root/.config/helm -v ~/.cache/helm:/root/.cache/helm -v /workspace:/go/src/agones.dev/agones -v /workspace/build//.gomod:/go/pkg/mod -v /workspace/build//.gocache:/root/.cache/go-build -e "VERSION=1.9.0-3317115" \
        --net host -e AGONES_SDK_GRPC_PORT=9002 -e AGONES_SDK_HTTP_PORT=9102 -e FEATURE_GATES=  agones-build-sdk-node:2ca282f4dc sdktest ; \
else \
    echo "Command sdktest not found - nothing to execute" ; \
fi
/workspace/build
/usr/bin/make ensure-image IMAGE_TAG=gcr.io/agones-images/agones-sdk:1.9.0-3317115 BUILD_TARGET=build-agones-sdk-server-image
{"message":"shutting down sdk server","severity":"info","source":"main","time":"2020-09-09T14:59:48.078413197Z"}
{"message":"Received requests match expected list. Test run was successful","sdkName":"go","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-09T14:59:48.07854491Z"}
{"error":"http: Server closed","message":"http server closed","severity":"info","source":"main","time":"2020-09-09T14:59:48.078693847Z"}
# run with feature flags enabled
/usr/bin/make run-sdk-conformance-no-build SDK_FOLDER=go GRPC_PORT=9001 HTTP_PORT=9101 FEATURE_GATES=PlayerTracking=true TESTS=ready,allocate,setlabel,setannotation,gameserver,health,shutdown,watch,reserve,getplayercapacity,setplayercapacity,playerconnect,playerdisconnect,getplayercount,isplayerconnected,getconnectedplayers
/usr/bin/make ensure-image IMAGE_TAG=agones-build-sdk-node:2ca282f4dc BUILD_TARGET=build-build-sdk-image SDK_FOLDER=node
/usr/bin/make ensure-image IMAGE_TAG=agones-build-sdk-go:6a1f217343 BUILD_TARGET=build-build-sdk-image SDK_FOLDER=go
/usr/bin/make ensure-image IMAGE_TAG=gcr.io/agones-images/agones-sdk:1.9.0-3317115 BUILD_TARGET=build-agones-sdk-server-image
{"message":"shutting down sdk server","severity":"info","source":"main","time":"2020-09-09T15:00:08.649479539Z"}
{"message":"Could not find a request which was expected","request":"ready","sdkName":"node","severity":"error","source":"*sdkserver.LocalSDKServer","time":"2020-09-09T15:00:08.649615765Z"}
{"expected":["ready","allocate","setlabel","setannotation","gameserver","health","shutdown","watch","reserve"],"message":"Testing Failed","received":[],"sdkName":"node","severity":"info","source":"*sdkserver.LocalSDKServer","time":"2020-09-09T15:00:08.649638801Z"}
make[2]: *** [run-sdk-conformance-no-build] Error 1
includes/sdk.mk:145: recipe for target 'run-sdk-conformance-no-build' failed
make[1]: *** [run-sdk-conformance-test] Error 2
includes/sdk.mk:152: recipe for target 'run-sdk-conformance-test' failed
includes/sdk.mk:159: recipe for target 'run-sdk-conformance-test-node' failed
make: *** [run-sdk-conformance-test-node] Error 2
make: *** Waiting for unfinished jobs....
DOCKER_RUN_ARGS="--net host -e AGONES_SDK_GRPC_PORT=9001 -e AGONES_SDK_HTTP_PORT=9101 -e FEATURE_GATES=PlayerTracking=true " COMMAND=sdktest /usr/bin/make run-sdk-command & \
docker run -p 9001:9001 -p 9101:9101 -e "FEATURE_GATES=PlayerTracking=true" -e "ADDRESS=" -e "TEST=ready,allocate,setlabel,setannotation,gameserver,health,shutdown,watch,reserve,getplayercapacity,setplayercapacity,playerconnect,playerdisconnect,getplayercount,isplayerconnected,getconnectedplayers" -e "SDK_NAME=go" -e "TIMEOUT=30" -e "DELAY=1" \
--net=host gcr.io/agones-images/agones-sdk:1.9.0-3317115 --grpc-port 9001 --http-port 9101
WARNING: Published ports are discarded when using host network mode
+ /root/sdktest.sh
+ cd /go/src/agones.dev/agones/test/sdk/nodejs
+ npm run testSDK
{"ctlConf":{"Address":"","IsLocal":false,"LocalFile":"","Delay":1,"Timeout":30,"Test":"ready,allocate,setlabel,setannotation,gameserver,health,shutdown,watch,reserve,getplayercapacity,setplayercapacity,playerconnect,playerdisconnect,getplayercount,isplayerconnected,getconnectedplayers","TestSdkName":"go","GRPCPort":9001,"HTTPPort":9101},"featureGates":"ContainerPortAllocation=true\u0026Example=true\u0026PlayerTracking=true\u0026SDKWatchSendOnExecute=false","message":"Starting sdk sidecar","severity":"info","source":"main","time":"2020-09-09T15:00:19.400098007Z","version":"1.9.0-3317115"}
{"message":"Waiting 1 seconds before starting","severity":"info","source":"main","time":"2020-09-09T15:00:19.400242234Z"}

> @ testSDK /go/src/agones.dev/agones/test/sdk/nodejs
> node ./testSDKClient.js

attempting to connect
aLekSer commented 4 years ago

TODO to make debugging much easier using logs only: we should add timestamps into all log lines, not just inside SDK server Go binary:

--net=host gcr.io/agones-images/agones-sdk:1.9.0-3317115 --grpc-port 9002 --http-port 9102
WARNING: Published ports are discarded when using host network mode
{"ctlConf":{"Address":"","IsLocal":false,"LocalFile":"","Delay":1,"Timeout":30,"Test":"ready,allocate,setlabel,setannotation,gameserver,health,shutdown,watch,reserve","TestSdkName":"node","GRPCPort":9002,"HTTPPort":9102},"featureGates":"ContainerPortAllocation=true\u0026Example=true\u0026PlayerTracking=false\u0026SDKWatchSendOnExecute=false","message":"Starting sdk sidecar","severity":"info","source":"main","time":"2020-09-09T14:59:37.647849379Z","version":"1.9.0-3317115"}

but before these lines > node ./testSDKClient.js also.

aLekSer commented 4 years ago

I found the similar thing with Golang sdk when tests step overlaps with sdk-conformance in cloud build:

"tests":    docker run --rm -v /workspace/build//.config/gcloud:/root/.config/gcloud -v ~/.kube/:/root/.kube -v ~/.config/helm:/root/.config/helm -v ~/.cache/helm:/root/.cache/helm -v /workspace:/go/src/agones.dev/agones -v /workspace/build//.gomod:/go/pkg/mod -v /workspace/build//.gocache:/root/.cache/go-build -e "VERSION=1.9.0-b63b5f7" \
Step #17 - "tests":          agones-build-sdk-go:6a1f217343 test ; \
Step #17 - "tests": else \
Step #17 - "tests":     echo "Command test not found - nothing to execute" ; \
Step #17 - "tests": fi
Step #17 - "tests": /workspace/build
Step #23 - "sdk-conformance": + /root/sdktest.sh
Step #23 - "sdk-conformance": + GO111MODULE=off
Step #23 - "sdk-conformance": + cd /go/src/agones.dev/agones/test/sdk/go
Step #23 - "sdk-conformance": + go run sdk-client-test.go

make tests would run test.sh, while SDK conformance would run sdktest.sh but them should not be invoked simultaneously.

markmandel commented 4 years ago

@aLekSer seems like we can close this now? Looks like you solved the problem!

aLekSer commented 4 years ago

Yes, I left this ticket in an Open state just to see the rarity of failures in at least 10 runs.