openshift-online / maestro

Maestro Service Repo
Apache License 2.0
9 stars 17 forks source link

Failed to update server instance heartbeat to the database #125

Closed yanmxa closed 3 months ago

yanmxa commented 3 months ago

When the maestro server is down for a while, it's up again. It might not sync its heartbeat to the database causing the following reasons:

E0612 09:28:05.379536   72292 logger.go:129]   failed to mark transaction for rollback: could not retrieve transaction from context 
E0612 09:28:05.380320   72292 logger.go:129]   Unable to upsert maestro instance: pq: duplicate key value violates unique constraint "server_instances_pkey" 

So the behavior will be:

  1. The server is up, update the heartbeat
  2. The server is down, then the liveness goroutine will mark the server as deleted if it reaches a specific duration
  3. The server is up again, it will update the the heartbeat to the database and then mark the deleted_at as null

Reference: https://github.com/openshift-online/maestro/issues/109 Signed-off-by: myan myan@redhat.com

yanmxa commented 3 months ago

/assign @morvencao

clyang82 commented 3 months ago

/ok-to-test