kyma-project / compass-manager

Manager for the compass module
Apache License 2.0
1 stars 8 forks source link

Compass Manager - Investigate and fix problem with retry during unregistration of Kyma runtimes on Prod #154

Closed koala7659 closed 4 months ago

koala7659 commented 5 months ago

I have noticed after rollout to Prod that Compass Manager always fails at the first attempt of Runtime unregistration from Compass. With the second attempt the runtime is unregistered successfully.

Additional info:

The same component on stage works fine Compass-Manager version: 1.0.6 Image version: v20240402-46db0573

Logs:

time="2024-04-09T12:15:45Z" level=info msg="Reconciliation triggered for Kyma Resource b3f3270f-f17c-4f29-8c0a-3aa06ab68f20"
time="2024-04-09T12:15:45Z" level=info msg="Updated Compass Mapping Status for b3f3270f-f17c-4f29-8c0a-3aa06ab68f20: registered=true, configured=false, state=Processing"
time="2024-04-09T12:15:50Z" level=info msg="Reconciliation triggered for Kyma Resource b3f3270f-f17c-4f29-8c0a-3aa06ab68f20"
time="2024-04-09T12:15:50Z" level=info msg="Attempting to configure Compass Runtime Agent for Runtime 8bce5bed-a6bf-4bac-a416-22f3cbdce25b"
time="2024-04-09T12:15:50.600328435Z" level=info msg="Received OneTimeToken for Runtime 8bce5bed-a6bf-4bac-a416-22f3cbdce25b in Director for Global Account 8a200117-40d9-414a-bef2-b9a7ab9d3643" component="director/directorclient.go:126:director.(*directorClient).GetConnectionToken"
time="2024-04-09T12:15:50Z" level=info msg="Compass Runtime Agent for Runtime 8bce5bed-a6bf-4bac-a416-22f3cbdce25b configured."
time="2024-04-09T12:15:50Z" level=info msg="Updated Compass Mapping Status for b3f3270f-f17c-4f29-8c0a-3aa06ab68f20: registered=true, configured=true, state=Ready"
time="2024-04-09T12:45:43Z" level=info msg="Reconciliation triggered for Kyma Resource b3f3270f-f17c-4f29-8c0a-3aa06ab68f20"
time="2024-04-09T12:45:43Z" level=info msg="Runtime deregistration in Compass for Kyma Resource b3f3270f-f17c-4f29-8c0a-3aa06ab68f20"
time="2024-04-09T12:46:13.034341031Z" level=info msg=">> variables: map[]" component="graphql/client.go:81:graphql.(*client).Do"
time="2024-04-09T12:46:13.034412034Z" level=info msg=">> query: mutation {\n\tresult: unregisterRuntime(id: \"8bce5bed-a6bf-4bac-a416-22f3cbdce25b\") {\n\t\tid\n}}" component="graphql/client.go:81:graphql.(*client).Do"
time="2024-04-09T12:46:13.034462167Z" level=info msg=">> headers: map[Accept:[application/json; charset=utf-8] Authorization:[*****************************************************************************************************] Content-Type:[application/json; charset=utf-8] Tenant:[8a200117-40d9-414a-bef2-b9a7ab9d3643]]" component="graphql/client.go:81:graphql.(*client).Do"
time="2024-04-09T12:46:13.034521147Z" level=error msg="Error while unregistering runtime in Director: Failed to unregister runtime 8bce5bed-a6bf-4bac-a416-22f3cbdce25b in Director, Failed to execute GraphQL request to Director: Post \"https://compass-gateway-auth-oauth.mps.kyma.cloud.sap/director/graphql\": context deadline exceeded" component="util/retry.go:17:util.RetryOnError"
time="2024-04-09T12:46:23Z" level=info msg="Runtime b3f3270f-f17c-4f29-8c0a-3aa06ab68f20 deregistered from Compass"
time="2024-04-09T12:46:23.966718685Z" level=info msg="Successfully unregistered Runtime 8bce5bed-a6bf-4bac-a416-22f3cbdce25b in Director for tenant 8a200117-40d9-414a-bef2-b9a7ab9d3643" component="director/directorclient.go:152:director.(*directorClient).DeleteRuntime"
tobiscr commented 5 months ago

Relates to #112

VOID404 commented 4 months ago

This doesn't seem to happen anymore. I cannot reproduce and recent logs in compass manager has no errors. Metrics do show a spike in retries on the day of the issue, but since then everything seems to work as expected.