Failed to update server instance heartbeat to the database

yanmxa commented 3 months ago

When the maestro server is down for a while, it's up again. It might not sync its heartbeat to the database causing the following reasons:

E0612 09:28:05.379536   72292 logger.go:129]   failed to mark transaction for rollback: could not retrieve transaction from context 
E0612 09:28:05.380320   72292 logger.go:129]   Unable to upsert maestro instance: pq: duplicate key value violates unique constraint "server_instances_pkey"

So the behavior will be:

The server is up, update the heartbeat
The server is down, then the liveness goroutine will mark the server as deleted if it reaches a specific duration
The server is up again, it will update the the heartbeat to the database and then mark the deleted_at as null

Reference: https://github.com/openshift-online/maestro/issues/109 Signed-off-by: myan myan@redhat.com

yanmxa commented 3 months ago

/assign @morvencao

clyang82 commented 3 months ago

/ok-to-test

openshift-online / maestro

Failed to update server instance heartbeat to the database #125