hashgraph / hedera-mirror-node

Hedera Mirror Node archives data from consensus nodes and serves it via an API
Apache License 2.0
145 stars 111 forks source link

Replicas In Stackgres Cluster Randomly Stop Replicating #6931

Closed jnels124 closed 4 months ago

jnels124 commented 1 year ago

Description

There are several occasions where the replicas in the cluster have stopped replicating. It is sure to happen if leaving the replicas down and performing inserts/updates and then bringing the replicas back up. I have seen the same error occur at least once while the replicas were never taken down.

We may be able to fix this issue by configuring wal_keep_segments but there may be additional issues at the stackgres/patroni layer.

Steps to reproduce

  1. bring replicas down and perform inserts.
  2. bring replicas up after a reasonable amount of time
  3. Bring replicas backup and notice the errors in the patroni container Screenshot 2023-09-21 at 10 30 14 AM

Additional context

No response

Hedera network

other

Version

0.90-SNAPSHOT

Operating system

None

jnels124 commented 4 months ago

This is not something that will present a problem. The issue will only occur if coordinator replicas are taken offline for an extended period of time to where WAL files for the replicated timeline have been removed. We should expect to have to rebuild replicas if we ever take them down for an extended period of time.