satoshipay / stellar-helm-charts

Helm charts for Stellar applications (Core, Horizon, Friendbot, ...)
Apache License 2.0
10 stars 11 forks source link

Horizon needs to delay starting until Stellar Core is `ready` #10

Open todkap opened 5 years ago

todkap commented 5 years ago

We found in our testing that attempting to start both helm charts (core and horizon) at the same time causes errors such as gaps in the ledger.

ime="2019-05-28T13:37:15.695Z" level=info msg="Ingesting ledgers..." first_ledger=556188 last_ledger=556287 pid=1 service=ingest
time="2019-05-28T13:37:15.699Z" level=error msg="Error ingesting ledgers" err="Gap detected in stellar-core database (ledger=556188). More information: https://www.stellar.org/developers/software/known-issues.html#gaps-detected" first_ledger=556188 last_ledger=556287 pid=1 service=ingest
time="2019-05-28T13:37:16.623Z" level=info msg="Ingesting ledgers..." first_ledger=556188 last_ledger=556293 pid=1 service=ingest
time="2019-05-28T13:37:16.625Z" level=error msg="Error ingesting ledgers" err="Gap detected in stellar-core database (ledger=556188). More information: https://www.stellar.org/developers/software/known-issues.html#gaps-detected" first_ledger=556188 last_ledger=556293 pid=1 service=ingest

Delaying the start of horizon until core has passed the readiness probes seems to resolve the issue.

Install script that handles the check for readiness to be complete

nodeSeed=SCXZBWO7UYZ3TLJFLQG54MICBKRIODW7FV673B4AQINU3VLXOLRISHN7
namespace=stellar-testnet
kubectl config set-context $(kubectl config current-context) --namespace=$namespace

stellarCore=stellar-core
# Delete stellar core and reinstall
helm delete $stellarCore --purge
helm install --namespace $namespace --name $stellarCore --set nodeSeed=$nodeSeed --values stellar-core.testnet.values.yaml stellar-core

echo "Waiting for $stellarCore to be running"
## Verify everything is running before taking next step
statusCheck="NOT_STARTED"
while [ "$statusCheck" != "" ] ; do
    sleep 60
    statusCheck=$(kubectl get pods --namespace $namespace -o json | jq '.items[].status.phase' | grep -v "Running")
    echo "Still starting $stellarCore pods $(date)"
done

stellarHorizon=stellar-horizon
helm repo update
helm dependency update stellar-horizon
helm delete $stellarHorizon --purge
helm install \
  --namespace $namespace \
  --name $stellarHorizon \
  --set ingress.enabled=true \
  --set ingress.hosts[0].name="stellar-satoshipay.us-south.containers.appdomain.cloud" \
  --set ingress.hosts[0].tlsSecret="stellar-satoshipay" \
  --set service.port=8000 \
  --values stellar-horizon.testnet.values.yaml \
  stellar-horizon
todkap commented 5 years ago

After running the test a subsequent time, we really need to wait for validated to be true before starting horizon.

Taken from pod logs for stellar core...

Still syncing

2019-05-28T19:55:13.261 GBN6Y [Herder INFO] Quorum information for 560439 : {"agree":3,"delayed":0,"disagree":0,"fail_at":1,"hash":"1e8826","missing":1,"phase":"EXTERNALIZE","validated":false}

Quorum

2019-05-28T19:56:47.191 GBN6Y [Herder INFO] Quorum information for 560456 : {"agree":3,"delayed":0,"disagree":0,"fail_at":1,"hash":"1e8826","missing":1,"phase":"EXTERNALIZE","validated":true}