Cluster times out new connections

jason-magnetic-io commented 3 years ago

First, I want to say that we've been using NATS streaming for nearly 2 years without issues.

Last week we observed timeouts when new clients tried to connect to stan. Approximately, 3 out of 4 connection attempts were unsuccessful.

The existing clients remained connected and were unaffected.

The only way we could resolve the issue was by restarting the cluster with empty file stores and reconnecting all the clients.

Versions: nats-streaming-server version 0.19.0; nats-server: v2.1.9; nats-account-server: 0.8.4

The cluster is running on Kubernetes on Google Cloud GKE using this configuration:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: stan-config
  namespace: nats
data:
  stan.conf: |
    port: 4222
    http: 8222
    streaming {
     ns: "nats://nats:4222"
     id: stan
     store: file
     dir: /data/stan/store
     cluster {
       node_id: $POD_NAME
       log_path: /data/stan/log
       # Explicit names of resulting peers
       peers: ["stan-0", "stan-1", "stan-2"]
      }
      store_limits: { max_channels: 0 }
      credentials: "/etc/stan-config/keys/nss.creds"
    }
---
apiVersion: v1
kind: Service
metadata:
  name: stan
  namespace: nats
  labels:
    app: stan
spec:
  selector:
    app: stan
  clusterIP: None
  ports:
  - name: metrics
    port: 7777
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: stan
  namespace: nats
  labels:
    app: stan
spec:
  selector:
    matchLabels:
      app: stan
  serviceName: stan
  replicas: 3
  volumeClaimTemplates:
  - metadata:
      name: stan-sts-vol
    spec:
      accessModes:
      - ReadWriteOnce
      volumeMode: "Filesystem"
      resources:
        requests:
          storage: 1Gi
  template:
    metadata:
      labels:
        app: stan
    spec:
      # Prevent NATS Streaming pods running in same host.
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: "kubernetes.io/hostname"
            labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - stan
      # STAN Server
      containers:
      - name: stan
        image: nats-streaming:0.19
        ports:
        - containerPort: 8222
          name: monitor
        args:
         - "-sc"
         - "/etc/stan-config/stan.conf"

        # Required to be able to define an environment variable
        # that refers to other environment variables.  This env var
        # is later used as part of the configuration file.
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
          - name: config-volume
            mountPath: /etc/stan-config
          - name: stan-sts-vol
            mountPath: /data/stan
          - name: secret-volume
            mountPath: /etc/stan-config/keys

        # Disable CPU limits.
        resources:
          requests:
            cpu: 0

        livenessProbe:
          httpGet:
            path: /
            port: 8222
          initialDelaySeconds: 10
          timeoutSeconds: 5
      - name: metrics
        image: synadia/prometheus-nats-exporter:0.6.0
        args:
        - -connz
        - -routez
        - -subz
        - -varz
        - -channelz
        - -serverz
        # - -prefix=stan
        - -use_internal_server_id
        - -DV
        - http://localhost:8222/
        ports:
        - containerPort: 7777
          name: metrics
      volumes:
      - name: config-volume
        configMap:
          name: stan-config
      - name: secret-volume
        secret:
          secretName: nats

There were no unusual spikes in memory or CPU usage.

These are the logs for stan-o, stan-1 and stan-2 building up to the issue:

[1] 2021/01/21 23:30:45.935711 [ERR] STREAM: [Client:6-vamp-test-2JqjHZJelS9k9951FLnuL3manzbeDR60] Timed out on heartbeats
--
[1] 2021/01/21 23:30:21.485738 [ERR] STREAM: [Client:6-vamp-test-fM4mssPiu6YngtyMuOLqdirRZkZPkoOI] Timed out on heartbeats
[1] 2021/01/21 23:30:02.478162 [ERR] STREAM: [Client:6-vamp-test-9J3kuAg7LWqDVFsvpJTd2C3dwO9YjnnR] Timed out on heartbeats
[1] 2021/01/21 23:29:58.339652 [ERR] STREAM: [Client:6-vamp-test-KTdu02E3LEENeFGTeVgAohfXEGRdNIVF] Timed out on heartbeats
[1] 2021/01/21 23:28:08.701127 [ERR] STREAM: [Client:6-vamp-test-qnH3kmcsOcMQnSLMRxLBeUvNf4aD9pi6] Timed out on heartbeats
[1] 2021/01/21 23:12:36.574258 [ERR] STREAM: [Client:6-vamp-test-kfni01P1liLvJgq8uOAIZTDFIktEj8ZB] Timed out on heartbeats
[1] 2021/01/21 23:04:36.510003 [ERR] STREAM: [Client:6-vamp-test-QKkpFpnOn1U8eDj7GnTdmuLfTphjQYhd] Timed out on heartbeats
[1] 2021/01/21 23:03:03.487020 [ERR] STREAM: [Client:6-vamp-test-uvIeA7JxGOA9v7UFkLS4xXtB9grP1XBT] Timed out on heartbeats
[1] 2021/01/21 23:01:50.570191 [ERR] STREAM: [Client:6-vamp-test-YTVnLAcJmKIQSQJpe4tHUkRZSkozPGDD] Timed out on heartbeats
[1] 2021/01/21 23:01:32.081052 [ERR] STREAM: [Client:6-vamp-test-MidkLf5sGMCdlajEA3AUUCV7VXGdsh3L] Timed out on heartbeats
[1] 2021/01/21 23:01:28.396357 [ERR] STREAM: [Client:6-vamp-test-edzIE2UkAuQ3aNjt60Ki8TdcndK0glnx] Timed out on heartbeats
[1] 2021/01/21 22:56:44.734691 [ERR] STREAM: [Client:6-vamp-test-scuUuuZ7847JgtdGCevm9Jpo8DSUmEwX] Timed out on heartbeats
[1] 2021/01/21 22:51:40.601347 [ERR] STREAM: [Client:6-vamp-test-nUcHqMiyxuQ8UacGCn2KXlVqc3K143lg] Timed out on heartbeats
[1] 2021/01/21 22:46:32.578412 [ERR] STREAM: [Client:6-vamp-test-SAQbabWO2XooT5oCWswbeaXQmtlOqGCE] Timed out on heartbeats
[1] 2021/01/21 22:41:45.492054 [ERR] STREAM: [Client:vamp-vamp-k8s-controller-5cb84d9674-mxn8p] Timed out on heartbeats
[1] 2021/01/21 22:36:05.576739 [ERR] STREAM: [Client:6-vamp-test-W5Frczx8wk7ZMy33g4NOxYjpJt0D4Svh] Timed out on heartbeats
[1] 2021/01/21 22:30:51.572569 [ERR] STREAM: [Client:6-vamp-test-ba2KJg4ElWnoQceLP2OP0SkcH1EAnixE] Timed out on heartbeats
[1] 2021/01/21 22:23:20.993032 [ERR] STREAM: [Client:vamp-vamp-k8s-controller-5cb84d9674-mxn8p] Timed out on heartbeats
[1] 2021/01/21 22:20:15.536469 [ERR] STREAM: [Client:6-vamp-test-oeTStIYgpxBqtBeG2kUJt9LA8LggKKYp] Timed out on heartbeats
[1] 2021/01/21 22:15:03.540688 [ERR] STREAM: [Client:6-vamp-test-mITLHfp06zScsJriusampmW4y70ctVNr] Timed out on heartbeats
[1] 2021/01/21 22:11:17.023235 [ERR] STREAM: [Client:vamp-vamp-k8s-controller-5cb84d9674-mxn8p] Timed out on heartbeats
[1] 2021/01/21 22:09:56.648933 [ERR] STREAM: [Client:6-vamp-test-ElQ7gNgU9kOKvdnIkyX4XShVSGZuhxYM] Timed out on heartbeats
[1] 2021/01/21 21:57:19.481640 [ERR] STREAM: [Client:vamp-vamp-k8s-controller-5cb84d9674-mxn8p] Timed out on heartbeats
[1] 2021/01/21 21:56:52.483083 [ERR] STREAM: [Client:6-vamp-test-SBZ4I0iXmTBoIMmEyYWvG08vd53n3Yf7] Timed out on heartbeats
[1] 2021/01/21 21:55:24.480298 [ERR] STREAM: [Client:6-vamp-test-Aa13QdTVGPGw8UBuuNkyCE4MlIWT1xyK] Timed out on heartbeats
[1] 2021/01/21 21:54:38.450106 [ERR] STREAM: [Client:6-vamp-test-S650STM4glzJOzfxRxbb2qBSafCs7i5y] Timed out on heartbeats
[1] 2021/01/21 21:54:10.469974 [ERR] STREAM: [Client:6-vamp-test-7PzsowWHTznKhTeoi5L9JeCd8HvSOwIO] Timed out on heartbeats
[1] 2021/01/21 21:53:53.387026 [ERR] STREAM: [Client:6-vamp-test-1f63GuZKuIXJucogYu86XdOhGgm71nDj] Timed out on heartbeats
[1] 2021/01/21 21:53:49.923315 [ERR] STREAM: [Client:6-vamp-test-TePtzvynrqd5d3wJ0tlSwy3MyJz4Gc7v] Timed out on heartbeats
[1] 2021/01/21 21:47:57.056179 [ERR] STREAM: [Client:installer-6-24] Timed out on heartbeats
[1] 2021/01/21 21:19:14.215675 [INF] STREAM: Channel "vamp-cloud-release-agent-6" has been created
[1] 2021/01/21 21:19:14.195217 [INF] STREAM: Channel "vamp-cloud-release-agent-6" has been created
[1] 2021/01/21 21:19:14.070124 [INF] STREAM: Channel "vamp-cloud-release-agent-6" has been created
[1] 2021/01/21 20:46:12.332703 [INF] STREAM: Channel "vamp-cloud-6" has been created
[1] 2021/01/21 20:46:12.308695 [INF] STREAM: Channel "vamp-cloud-6" has been created
[1] 2021/01/21 20:46:12.146748 [INF] STREAM: Channel "vamp-cloud-6" has been created
[1] 2021/01/20 14:16:31.132097 [INF] STREAM: Deletion took 15.634676ms
[1] 2021/01/20 14:16:31.116446 [INF] STREAM: Deleting raft logs from 795839 to 804035
[1] 2021/01/20 14:14:05.262843 [INF] STREAM: Deletion took 5.346369ms
[1] 2021/01/20 14:14:05.257491 [INF] STREAM: Deleting raft logs from 795835 to 804027
[1] 2021/01/20 13:39:47.778764 [INF] STREAM: Deletion took 7.408321ms
[1] 2021/01/20 13:39:47.771337 [INF] STREAM: Deleting raft logs from 795691 to 803887
[1] 2021/01/19 09:35:00.626612 [INF] STREAM: ----------------------------------
[1] 2021/01/19 09:35:00.626610 [INF] STREAM:   Inactivity   :     unlimited *
[1] 2021/01/19 09:35:00.626608 [INF] STREAM:   Age          :     unlimited *
[1] 2021/01/19 09:35:00.626606 [INF] STREAM:   Bytes        :     976.56 MB *
[1] 2021/01/19 09:35:00.626604 [INF] STREAM:   Messages     :       1000000 *
[1] 2021/01/19 09:35:00.626602 [INF] STREAM:   Subscriptions:          1000 *
[1] 2021/01/19 09:35:00.626586 [INF] STREAM: --------- Channels Limits --------
[1] 2021/01/19 09:35:00.626584 [INF] STREAM: Channels:            unlimited
[1] 2021/01/19 09:35:00.626581 [INF] STREAM: ---------- Store Limits ----------
[1] 2021/01/19 09:35:00.626550 [INF] STREAM: Store location: /data/stan/store
[1] 2021/01/19 09:35:00.626537 [INF] STREAM: Message store is RAFT_FILE
[1] 2021/01/19 09:35:00.619940 [INF] STREAM: done restoring from snapshot
[1] 2021/01/19 09:35:00.619934 [INF] STREAM: restoring channel "vamp-release-plans-22" from snapshot
[1] 2021/01/19 09:35:00.619930 [INF] STREAM: restoring channel "vamp-cloud-optimiser-3" from snapshot
[1] 2021/01/19 09:35:00.619922 [INF] STREAM: restoring channel "1-ingress-nginx-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619904 [INF] STREAM: restoring channel "1-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619897 [INF] STREAM: restoring channel "vamp-release-plans-4" from snapshot
[1] 2021/01/19 09:35:00.619893 [INF] STREAM: restoring channel "vamp-cloud-optimiser-1" from snapshot
[1] 2021/01/19 09:35:00.619889 [INF] STREAM: restoring channel "5-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619884 [INF] STREAM: restoring channel "4-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619871 [INF] STREAM: restoring channel "vamp-cloud-release-agent-4" from snapshot
[1] 2021/01/19 09:35:00.619865 [INF] STREAM: restoring channel "2-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619860 [INF] STREAM: restoring channel "vamp-release-plans-2" from snapshot
[1] 2021/01/19 09:35:00.619849 [INF] STREAM: restoring channel "vamp-cloud-notifications" from snapshot
[1] 2021/01/19 09:35:00.619843 [INF] STREAM: restoring channel "5-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:35:00.619838 [INF] STREAM: restoring channel "4-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:35:00.619829 [INF] STREAM: restoring channel "2-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619822 [INF] STREAM: restoring channel "vamp-cloud-2" from snapshot
[1] 2021/01/19 09:35:00.619814 [INF] STREAM: restoring channel "1-projectcontour-monitoring-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619798 [INF] STREAM: restoring channel "1-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619793 [INF] STREAM: restoring channel "5-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619788 [INF] STREAM: restoring channel "vamp-release-plans-12" from snapshot
[1] 2021/01/19 09:35:00.619784 [INF] STREAM: restoring channel "2-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619776 [INF] STREAM: restoring channel "1-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619768 [INF] STREAM: restoring channel "vamp-release-plans-7" from snapshot
[1] 2021/01/19 09:35:00.619759 [INF] STREAM: restoring channel "vamp-release-plans-19" from snapshot
[1] 2021/01/19 09:35:00.619748 [INF] STREAM: restoring channel "3-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619742 [INF] STREAM: restoring channel "3-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619733 [INF] STREAM: restoring channel "3-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619726 [INF] STREAM: restoring channel "vamp-cloud-3" from snapshot
[1] 2021/01/19 09:35:00.619713 [INF] STREAM: restoring channel "2-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619695 [INF] STREAM: restoring channel "1-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619686 [INF] STREAM: restoring channel "1-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619674 [INF] STREAM: restoring channel "5-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619664 [INF] STREAM: restoring channel "vamp-release-plans-16" from snapshot
[1] 2021/01/19 09:35:00.619654 [INF] STREAM: restoring channel "4-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619646 [INF] STREAM: restoring channel "vamp-release-plans-11" from snapshot
[1] 2021/01/19 09:35:00.619637 [INF] STREAM: restoring channel "1-projectcontour-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.619611 [INF] STREAM: restoring channel "1-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619581 [INF] STREAM: restoring channel "2-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619573 [INF] STREAM: restoring channel "vamp-cloud-optimiser-2" from snapshot
[1] 2021/01/19 09:35:00.619564 [INF] STREAM: restoring channel "vamp-release-plans-3" from snapshot
[1] 2021/01/19 09:35:00.619549 [INF] STREAM: restoring channel "vamp-cloud-1" from snapshot
[1] 2021/01/19 09:35:00.619526 [INF] STREAM: restoring channel "vamp-cloud-release-agent-1" from snapshot
[1] 2021/01/19 09:35:00.619520 [INF] STREAM: restoring channel "vamp-cloud-4" from snapshot
[1] 2021/01/19 09:35:00.619510 [INF] STREAM: restoring channel "3-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619502 [INF] STREAM: restoring channel "2-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:35:00.619492 [INF] STREAM: restoring channel "2-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619488 [INF] STREAM: restoring channel "2-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619479 [INF] STREAM: restoring channel "5-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619474 [INF] STREAM: restoring channel "3-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.619462 [INF] STREAM: restoring channel "3-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:35:00.619454 [INF] STREAM: restoring channel "vamp-release-plans-13" from snapshot
[1] 2021/01/19 09:35:00.619445 [INF] STREAM: restoring channel "vamp-release-plans-5" from snapshot
[1] 2021/01/19 09:35:00.619439 [INF] STREAM: restoring channel "vamp-release-plans-14" from snapshot
[1] 2021/01/19 09:35:00.619432 [INF] STREAM: restoring channel "vamp-cloud-5" from snapshot
[1] 2021/01/19 09:35:00.619427 [INF] STREAM: restoring channel "4-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:35:00.619421 [INF] STREAM: restoring channel "vamp-cloud-release-agent-3" from snapshot
[1] 2021/01/19 09:35:00.619409 [INF] STREAM: restoring channel "5-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618584 [INF] STREAM: restoring channel "4-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.618579 [INF] STREAM: restoring channel "1-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618575 [INF] STREAM: restoring channel "1-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618569 [INF] STREAM: restoring channel "5-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.618565 [INF] STREAM: restoring channel "5-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618562 [INF] STREAM: restoring channel "3-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618559 [INF] STREAM: restoring channel "3-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618554 [INF] STREAM: restoring channel "2-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:35:00.618545 [INF] STREAM: restoring channel "5-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618529 [INF] STREAM: restoring channel "vamp-release-plans-15" from snapshot
[1] 2021/01/19 09:35:00.618519 [INF] STREAM: restoring channel "5-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.618513 [INF] STREAM: restoring channel "4-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618504 [INF] STREAM: restoring channel "1-ingress-nginx-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618498 [INF] STREAM: restoring channel "vamp-release-plans-23" from snapshot
[1] 2021/01/19 09:35:00.618495 [INF] STREAM: restoring channel "3-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618490 [INF] STREAM: restoring channel "2-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618474 [INF] STREAM: restoring channel "1-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:35:00.618469 [INF] STREAM: restoring channel "vamp-release-plans-18" from snapshot
[1] 2021/01/19 09:35:00.618458 [INF] STREAM: restoring channel "3-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:35:00.618448 [INF] STREAM: restoring channel "vamp-release-plans-9" from snapshot
[1] 2021/01/19 09:35:00.618435 [INF] STREAM: restoring channel "vamp-release-plans-1" from snapshot
[1] 2021/01/19 09:35:00.618411 [INF] STREAM: restoring channel "1-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:35:00.618407 [INF] STREAM: restoring channel "vamp-cloud-optimiser-5" from snapshot
[1] 2021/01/19 09:35:00.618402 [INF] STREAM: restoring channel "vamp-cloud-release-agent-5" from snapshot
[1] 2021/01/19 09:35:00.618398 [INF] STREAM: restoring channel "vamp-cloud-optimiser-4" from snapshot
[1] 2021/01/19 09:35:00.618378 [INF] STREAM: restoring channel "vamp-cloud-release-agent-2" from snapshot
[1] 2021/01/19 09:35:00.618115 [INF] STREAM: restoring from snapshot
[1] 2021/01/19 09:34:55.184260 [INF] STREAM: peer 2: "stan-1"
[1] 2021/01/19 09:34:55.184257 [INF] STREAM: peer 1: "stan-0"
[1] 2021/01/19 09:34:55.184251 [INF] STREAM: Cluster known peers:
[1] 2021/01/19 09:34:55.184249 [INF] STREAM: Cluster Log Path: /data/stan/log
[1] 2021/01/19 09:34:55.184243 [INF] STREAM: Cluster Node ID : stan-2
[1] 2021/01/19 09:34:55.184041 [INF] STREAM: Recovered 83 channel(s)
[1] 2021/01/19 09:34:52.884818 [INF] STREAM: Recovering the state...
[1] 2021/01/19 09:34:52.855156 [INF] STREAM: Git commit: [c658000]
[1] 2021/01/19 09:34:52.855153 [INF] STREAM: Go version: go1.14.10
[1] 2021/01/19 09:34:52.855150 [INF] STREAM: ServerID: 9eRELgg09mViJUYbxoOlQ6
[1] 2021/01/19 09:34:52.855109 [INF] STREAM: Starting nats-streaming-server[stan] version 0.19.0
[1] 2021/01/19 09:34:30.335940 [INF] STREAM: Streaming Server is ready
[1] 2021/01/19 09:34:30.335879 [INF] STREAM: finished leader promotion actions
[1] 2021/01/19 09:34:30.332119 [INF] STREAM: server became leader, performing leader promotion actions
[1] 2021/01/19 09:34:24.352502 [INF] STREAM: Shutting down.
[1] 2021/01/19 09:34:24.325694 [INF] STREAM: connection "_NSS-stan-raft" reconnected to NATS Server at "nats://nats:4222"
[1] 2021/01/19 09:34:24.325530 [INF] STREAM: connection "_NSS-stan-acks" reconnected to NATS Server at "nats://10.52.3.8:4222"
[1] 2021/01/19 09:34:24.324880 [INF] STREAM: connection "_NSS-stan-raft_snap" reconnected to NATS Server at "nats://10.52.3.8:4222"
[1] 2021/01/19 09:34:24.316834 [INF] STREAM: connection "_NSS-stan-raft" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:34:24.316360 [INF] STREAM: connection "_NSS-stan-acks" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:34:24.315774 [INF] STREAM: connection "_NSS-stan-general" reconnected to NATS Server at "nats://10.52.3.8:4222"
[1] 2021/01/19 09:34:24.313149 [INF] STREAM: connection "_NSS-stan-send" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:34:24.312090 [INF] STREAM: connection "_NSS-stan-raft_snap" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:34:24.310652 [INF] STREAM: connection "_NSS-stan-raft" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:31:22.174585 [INF] STREAM: Streaming Server is ready
[1] 2021/01/19 09:31:22.174570 [INF] STREAM: finished leader promotion actions
[1] 2021/01/19 09:31:22.166498 [INF] STREAM: server became leader, performing leader promotion actions
[1] 2021/01/19 09:31:18.912576 [INF] STREAM: finished leader stepdown actions
[1] 2021/01/19 09:31:18.912262 [INF] STREAM: server lost leadership, performing leader stepdown actions
[1] 2021/01/19 09:31:14.928088 [INF] STREAM: ----------------------------------
[1] 2021/01/19 09:31:14.928086 [INF] STREAM:   Inactivity   :     unlimited *
[1] 2021/01/19 09:31:14.928084 [INF] STREAM:   Age          :     unlimited *
[1] 2021/01/19 09:31:14.928082 [INF] STREAM:   Bytes        :     976.56 MB *
[1] 2021/01/19 09:31:14.928079 [INF] STREAM:   Messages     :       1000000 *
[1] 2021/01/19 09:31:14.928073 [INF] STREAM:   Subscriptions:          1000 *
[1] 2021/01/19 09:31:14.928007 [INF] STREAM: --------- Channels Limits --------
[1] 2021/01/19 09:31:14.927984 [INF] STREAM: Channels:            unlimited
[1] 2021/01/19 09:31:14.927952 [INF] STREAM: ---------- Store Limits ----------
[1] 2021/01/19 09:31:14.927900 [INF] STREAM: Store location: /data/stan/store
[1] 2021/01/19 09:31:14.927855 [INF] STREAM: Message store is RAFT_FILE
[1] 2021/01/19 09:31:14.923895 [INF] STREAM: done restoring from snapshot
[1] 2021/01/19 09:31:14.923887 [INF] STREAM: restoring channel "3-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923879 [INF] STREAM: restoring channel "3-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:31:14.923866 [INF] STREAM: restoring channel "vamp-cloud-optimiser-3" from snapshot
[1] 2021/01/19 09:31:14.923862 [INF] STREAM: restoring channel "3-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923856 [INF] STREAM: restoring channel "vamp-release-plans-11" from snapshot
[1] 2021/01/19 09:31:14.923846 [INF] STREAM: restoring channel "vamp-release-plans-3" from snapshot
[1] 2021/01/19 09:31:14.923793 [INF] STREAM: restoring channel "2-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923790 [INF] STREAM: restoring channel "1-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923785 [INF] STREAM: restoring channel "vamp-release-plans-16" from snapshot
[1] 2021/01/19 09:31:14.923781 [INF] STREAM: restoring channel "2-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923774 [INF] STREAM: restoring channel "vamp-release-plans-9" from snapshot
[1] 2021/01/19 09:31:14.923770 [INF] STREAM: restoring channel "1-projectcontour-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923740 [INF] STREAM: restoring channel "1-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923720 [INF] STREAM: restoring channel "1-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923716 [INF] STREAM: restoring channel "vamp-cloud-optimiser-1" from snapshot
[1] 2021/01/19 09:31:14.923713 [INF] STREAM: restoring channel "vamp-cloud-optimiser-5" from snapshot
[1] 2021/01/19 09:31:14.923707 [INF] STREAM: restoring channel "5-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923702 [INF] STREAM: restoring channel "4-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923695 [INF] STREAM: restoring channel "vamp-release-plans-13" from snapshot
[1] 2021/01/19 09:31:14.923653 [INF] STREAM: restoring channel "2-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923636 [INF] STREAM: restoring channel "1-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923631 [INF] STREAM: restoring channel "vamp-cloud-release-agent-5" from snapshot
[1] 2021/01/19 09:31:14.923617 [INF] STREAM: restoring channel "4-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:31:14.923512 [INF] STREAM: restoring channel "vamp-release-plans-14" from snapshot
[1] 2021/01/19 09:31:14.923507 [INF] STREAM: restoring channel "vamp-cloud-release-agent-4" from snapshot
[1] 2021/01/19 09:31:14.923503 [INF] STREAM: restoring channel "vamp-release-plans-7" from snapshot
[1] 2021/01/19 09:31:14.923498 [INF] STREAM: restoring channel "vamp-release-plans-22" from snapshot
[1] 2021/01/19 09:31:14.923494 [INF] STREAM: restoring channel "5-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923491 [INF] STREAM: restoring channel "3-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923483 [INF] STREAM: restoring channel "vamp-cloud-notifications" from snapshot
[1] 2021/01/19 09:31:14.923479 [INF] STREAM: restoring channel "5-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923473 [INF] STREAM: restoring channel "4-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923465 [INF] STREAM: restoring channel "2-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923458 [INF] STREAM: restoring channel "2-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923428 [INF] STREAM: restoring channel "1-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923425 [INF] STREAM: restoring channel "1-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923420 [INF] STREAM: restoring channel "vamp-release-plans-1" from snapshot
[1] 2021/01/19 09:31:14.923416 [INF] STREAM: restoring channel "vamp-cloud-optimiser-4" from snapshot
[1] 2021/01/19 09:31:14.923410 [INF] STREAM: restoring channel "2-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923406 [INF] STREAM: restoring channel "5-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923395 [INF] STREAM: restoring channel "4-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923375 [INF] STREAM: restoring channel "3-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923356 [INF] STREAM: restoring channel "vamp-cloud-release-agent-1" from snapshot
[1] 2021/01/19 09:31:14.923352 [INF] STREAM: restoring channel "3-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923347 [INF] STREAM: restoring channel "vamp-cloud-release-agent-3" from snapshot
[1] 2021/01/19 09:31:14.923340 [INF] STREAM: restoring channel "2-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:31:14.923335 [INF] STREAM: restoring channel "vamp-cloud-2" from snapshot
[1] 2021/01/19 09:31:14.923326 [INF] STREAM: restoring channel "1-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923172 [INF] STREAM: restoring channel "1-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:31:14.923166 [INF] STREAM: restoring channel "5-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923155 [INF] STREAM: restoring channel "4-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923151 [INF] STREAM: restoring channel "vamp-cloud-4" from snapshot
[1] 2021/01/19 09:31:14.923147 [INF] STREAM: restoring channel "3-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923141 [INF] STREAM: restoring channel "vamp-release-plans-12" from snapshot
[1] 2021/01/19 09:31:14.923138 [INF] STREAM: restoring channel "1-projectcontour-monitoring-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923133 [INF] STREAM: restoring channel "1-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923128 [INF] STREAM: restoring channel "vamp-release-plans-2" from snapshot
[1] 2021/01/19 09:31:14.923123 [INF] STREAM: restoring channel "5-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923118 [INF] STREAM: restoring channel "vamp-release-plans-15" from snapshot
[1] 2021/01/19 09:31:14.923113 [INF] STREAM: restoring channel "4-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923108 [INF] STREAM: restoring channel "2-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923103 [INF] STREAM: restoring channel "vamp-release-plans-19" from snapshot
[1] 2021/01/19 09:31:14.923099 [INF] STREAM: restoring channel "3-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923093 [INF] STREAM: restoring channel "vamp-cloud-3" from snapshot
[1] 2021/01/19 09:31:14.923090 [INF] STREAM: restoring channel "vamp-cloud-optimiser-2" from snapshot
[1] 2021/01/19 09:31:14.923087 [INF] STREAM: restoring channel "1-ingress-nginx-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.923079 [INF] STREAM: restoring channel "vamp-release-plans-5" from snapshot
[1] 2021/01/19 09:31:14.923058 [INF] STREAM: restoring channel "vamp-cloud-1" from snapshot
[1] 2021/01/19 09:31:14.923054 [INF] STREAM: restoring channel "vamp-release-plans-23" from snapshot
[1] 2021/01/19 09:31:14.923050 [INF] STREAM: restoring channel "5-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:31:14.923043 [INF] STREAM: restoring channel "vamp-cloud-5" from snapshot
[1] 2021/01/19 09:31:14.923034 [INF] STREAM: restoring channel "3-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923029 [INF] STREAM: restoring channel "2-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923023 [INF] STREAM: restoring channel "vamp-cloud-release-agent-2" from snapshot
[1] 2021/01/19 09:31:14.923018 [INF] STREAM: restoring channel "vamp-release-plans-4" from snapshot
[1] 2021/01/19 09:31:14.923012 [INF] STREAM: restoring channel "5-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:31:14.923006 [INF] STREAM: restoring channel "5-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.923002 [INF] STREAM: restoring channel "1-ingress-nginx-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.922996 [INF] STREAM: restoring channel "5-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.922988 [INF] STREAM: restoring channel "vamp-release-plans-18" from snapshot
[1] 2021/01/19 09:31:14.922983 [INF] STREAM: restoring channel "3-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:31:14.922978 [INF] STREAM: restoring channel "2-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:31:14.922939 [INF] STREAM: restoring channel "1-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:31:14.922606 [INF] STREAM: restoring from snapshot
[1] 2021/01/19 09:31:13.001478 [INF] STREAM: peer 2: "stan-2"
[1] 2021/01/19 09:31:13.001475 [INF] STREAM: peer 1: "stan-1"
[1] 2021/01/19 09:31:13.001469 [INF] STREAM: Cluster known peers:
[1] 2021/01/19 09:31:13.001467 [INF] STREAM: Cluster Log Path: /data/stan/log
[1] 2021/01/19 09:31:13.001455 [INF] STREAM: Cluster Node ID : stan-0
[1] 2021/01/19 09:31:13.001123 [INF] STREAM: Recovered 83 channel(s)
[1] 2021/01/19 09:31:11.926848 [INF] STREAM: Recovering the state...
[1] 2021/01/19 09:31:11.897729 [INF] STREAM: Git commit: [c658000]
[1] 2021/01/19 09:31:11.897727 [INF] STREAM: Go version: go1.14.10
[1] 2021/01/19 09:31:11.897724 [INF] STREAM: ServerID: ewOD0W1P9v6gQLl9hFpdWn
[1] 2021/01/19 09:31:11.897653 [INF] STREAM: Starting nats-streaming-server[stan] version 0.19.0
[1] 2021/01/19 09:30:45.463072 [INF] STREAM: connection "_NSS-stan-send" reconnected to NATS Server at "nats://nats:4222"
[1] 2021/01/19 09:30:45.461563 [INF] STREAM: connection "_NSS-stan-general" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:30:45.460980 [INF] STREAM: connection "_NSS-stan-raft" reconnected to NATS Server at "nats://10.52.0.18:4222"
[1] 2021/01/19 09:30:45.460837 [INF] STREAM: connection "_NSS-stan-acks" reconnected to NATS Server at "nats://10.52.0.18:4222"
[1] 2021/01/19 09:30:45.450151 [INF] STREAM: connection "_NSS-stan-acks" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:30:45.449559 [INF] STREAM: connection "_NSS-stan-general" reconnected to NATS Server at "nats://10.52.2.8:4222"
[1] 2021/01/19 09:30:44.919882 [INF] STREAM: Shutting down.
[1] 2021/01/19 09:27:43.858219 [INF] STREAM: Streaming Server is ready
[1] 2021/01/19 09:27:43.858189 [INF] STREAM: finished leader promotion actions
[1] 2021/01/19 09:27:43.853323 [INF] STREAM: server became leader, performing leader promotion actions
[1] 2021/01/19 09:27:39.994623 [INF] STREAM: finished leader stepdown actions
[1] 2021/01/19 09:27:39.994437 [INF] STREAM: server lost leadership, performing leader stepdown actions
[1] 2021/01/19 09:27:35.366865 [INF] STREAM: ----------------------------------
[1] 2021/01/19 09:27:35.366820 [INF] STREAM:   Inactivity   :     unlimited *
[1] 2021/01/19 09:27:35.366773 [INF] STREAM:   Age          :     unlimited *
[1] 2021/01/19 09:27:35.366741 [INF] STREAM:   Bytes        :     976.56 MB *
[1] 2021/01/19 09:27:35.366692 [INF] STREAM:   Messages     :       1000000 *
[1] 2021/01/19 09:27:35.366640 [INF] STREAM:   Subscriptions:          1000 *
[1] 2021/01/19 09:27:35.366588 [INF] STREAM: --------- Channels Limits --------
[1] 2021/01/19 09:27:35.366551 [INF] STREAM: Channels:            unlimited
[1] 2021/01/19 09:27:35.366532 [INF] STREAM: ---------- Store Limits ----------
[1] 2021/01/19 09:27:35.366459 [INF] STREAM: Store location: /data/stan/store
[1] 2021/01/19 09:27:35.366287 [INF] STREAM: Message store is RAFT_FILE
[1] 2021/01/19 09:27:35.361955 [INF] STREAM: done restoring from snapshot
[1] 2021/01/19 09:27:35.361945 [INF] STREAM: restoring channel "vamp-cloud-2" from snapshot
[1] 2021/01/19 09:27:35.361940 [INF] STREAM: restoring channel "vamp-release-plans-16" from snapshot
[1] 2021/01/19 09:27:35.361931 [INF] STREAM: restoring channel "vamp-release-plans-15" from snapshot
[1] 2021/01/19 09:27:35.361925 [INF] STREAM: restoring channel "4-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361919 [INF] STREAM: restoring channel "3-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361912 [INF] STREAM: restoring channel "3-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361902 [INF] STREAM: restoring channel "vamp-cloud-release-agent-3" from snapshot
[1] 2021/01/19 09:27:35.361865 [INF] STREAM: restoring channel "1-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361859 [INF] STREAM: restoring channel "vamp-release-plans-14" from snapshot
[1] 2021/01/19 09:27:35.361856 [INF] STREAM: restoring channel "1-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361848 [INF] STREAM: restoring channel "1-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361843 [INF] STREAM: restoring channel "5-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:27:35.361832 [INF] STREAM: restoring channel "vamp-cloud-release-agent-5" from snapshot
[1] 2021/01/19 09:27:35.361827 [INF] STREAM: restoring channel "vamp-cloud-5" from snapshot
[1] 2021/01/19 09:27:35.361822 [INF] STREAM: restoring channel "2-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361813 [INF] STREAM: restoring channel "2-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361787 [INF] STREAM: restoring channel "vamp-cloud-notifications" from snapshot
[1] 2021/01/19 09:27:35.361783 [INF] STREAM: restoring channel "5-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361777 [INF] STREAM: restoring channel "vamp-release-plans-22" from snapshot
[1] 2021/01/19 09:27:35.361770 [INF] STREAM: restoring channel "2-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361762 [INF] STREAM: restoring channel "vamp-cloud-1" from snapshot
[1] 2021/01/19 09:27:35.361753 [INF] STREAM: restoring channel "vamp-release-plans-2" from snapshot
[1] 2021/01/19 09:27:35.361741 [INF] STREAM: restoring channel "4-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361730 [INF] STREAM: restoring channel "vamp-cloud-release-agent-4" from snapshot
[1] 2021/01/19 09:27:35.361706 [INF] STREAM: restoring channel "vamp-release-plans-11" from snapshot
[1] 2021/01/19 09:27:35.361691 [INF] STREAM: restoring channel "1-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361682 [INF] STREAM: restoring channel "1-ingress-nginx-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361655 [INF] STREAM: restoring channel "vamp-cloud-release-agent-1" from snapshot
[1] 2021/01/19 09:27:35.361647 [INF] STREAM: restoring channel "5-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361627 [INF] STREAM: restoring channel "4-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361618 [INF] STREAM: restoring channel "vamp-release-plans-12" from snapshot
[1] 2021/01/19 09:27:35.361603 [INF] STREAM: restoring channel "1-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361594 [INF] STREAM: restoring channel "5-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361588 [INF] STREAM: restoring channel "3-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:27:35.361581 [INF] STREAM: restoring channel "2-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361576 [INF] STREAM: restoring channel "1-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361565 [INF] STREAM: restoring channel "5-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361544 [INF] STREAM: restoring channel "vamp-release-plans-7" from snapshot
[1] 2021/01/19 09:27:35.361540 [INF] STREAM: restoring channel "5-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361534 [INF] STREAM: restoring channel "4-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:27:35.361526 [INF] STREAM: restoring channel "vamp-cloud-optimiser-3" from snapshot
[1] 2021/01/19 09:27:35.361522 [INF] STREAM: restoring channel "3-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361513 [INF] STREAM: restoring channel "3-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361507 [INF] STREAM: restoring channel "2-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361497 [INF] STREAM: restoring channel "vamp-cloud-release-agent-2" from snapshot
[1] 2021/01/19 09:27:35.361464 [INF] STREAM: restoring channel "1-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361458 [INF] STREAM: restoring channel "3-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361453 [INF] STREAM: restoring channel "2-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361446 [INF] STREAM: restoring channel "2-production-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361440 [INF] STREAM: restoring channel "vamp-release-plans-3" from snapshot
[1] 2021/01/19 09:27:35.361435 [INF] STREAM: restoring channel "vamp-release-plans-13" from snapshot
[1] 2021/01/19 09:27:35.361431 [INF] STREAM: restoring channel "3-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361427 [INF] STREAM: restoring channel "vamp-cloud-optimiser-1" from snapshot
[1] 2021/01/19 09:27:35.361424 [INF] STREAM: restoring channel "vamp-cloud-optimiser-5" from snapshot
[1] 2021/01/19 09:27:35.361420 [INF] STREAM: restoring channel "vamp-cloud-optimiser-4" from snapshot
[1] 2021/01/19 09:27:35.361411 [INF] STREAM: restoring channel "2-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361375 [INF] STREAM: restoring channel "1-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:27:35.361367 [INF] STREAM: restoring channel "1-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361358 [INF] STREAM: restoring channel "vamp-release-plans-5" from snapshot
[1] 2021/01/19 09:27:35.361351 [INF] STREAM: restoring channel "3-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361345 [INF] STREAM: restoring channel "vamp-cloud-3" from snapshot
[1] 2021/01/19 09:27:35.361336 [INF] STREAM: restoring channel "vamp-release-plans-9" from snapshot
[1] 2021/01/19 09:27:35.361327 [INF] STREAM: restoring channel "vamp-release-plans-4" from snapshot
[1] 2021/01/19 09:27:35.361287 [INF] STREAM: restoring channel "1-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361279 [INF] STREAM: restoring channel "vamp-release-plans-1" from snapshot
[1] 2021/01/19 09:27:35.361271 [INF] STREAM: restoring channel "5-projectcontour-monitoring-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361264 [INF] STREAM: restoring channel "vamp-cloud-4" from snapshot
[1] 2021/01/19 09:27:35.361260 [INF] STREAM: restoring channel "vamp-cloud-optimiser-2" from snapshot
[1] 2021/01/19 09:27:35.361256 [INF] STREAM: restoring channel "2-default-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361249 [INF] STREAM: restoring channel "1-ingress-nginx-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361243 [INF] STREAM: restoring channel "2-production-statefulset-updated-events" from snapshot
[1] 2021/01/19 09:27:35.361231 [INF] STREAM: restoring channel "vamp-release-plans-23" from snapshot
[1] 2021/01/19 09:27:35.361227 [INF] STREAM: restoring channel "5-production-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361221 [INF] STREAM: restoring channel "vamp-release-plans-18" from snapshot
[1] 2021/01/19 09:27:35.361217 [INF] STREAM: restoring channel "4-production-ingressannotations-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361213 [INF] STREAM: restoring channel "1-projectcontour-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361209 [INF] STREAM: restoring channel "5-projectcontour-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361202 [INF] STREAM: restoring channel "vamp-release-plans-19" from snapshot
[1] 2021/01/19 09:27:35.361197 [INF] STREAM: restoring channel "4-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361193 [INF] STREAM: restoring channel "1-projectcontour-monitoring-deployment-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.361188 [INF] STREAM: restoring channel "3-kube-system-deployment-added-events" from snapshot
[1] 2021/01/19 09:27:35.361181 [INF] STREAM: restoring channel "5-production-optimisation-deployment-ready-events" from snapshot
[1] 2021/01/19 09:27:35.361143 [INF] STREAM: restoring channel "3-production-statefulset-deleted-events" from snapshot
[1] 2021/01/19 09:27:35.360690 [INF] STREAM: restoring from snapshot
[1] 2021/01/19 09:27:33.049933 [INF] STREAM: peer 2: "stan-2"
[1] 2021/01/19 09:27:33.049930 [INF] STREAM: peer 1: "stan-0"
[1] 2021/01/19 09:27:33.049924 [INF] STREAM: Cluster known peers:
[1] 2021/01/19 09:27:33.049921 [INF] STREAM: Cluster Log Path: /data/stan/log
[1] 2021/01/19 09:27:33.049911 [INF] STREAM: Cluster Node ID : stan-1
[1] 2021/01/19 09:27:33.049619 [INF] STREAM: Recovered 83 channel(s)
[1] 2021/01/19 09:27:32.332007 [INF] STREAM: Recovering the state...
[1] 2021/01/19 09:27:32.296636 [INF] STREAM: Git commit: [c658000]
[1] 2021/01/19 09:27:32.296633 [INF] STREAM: Go version: go1.14.10
[1] 2021/01/19 09:27:32.296629 [INF] STREAM: ServerID: abJVRi8aQzxCpQBTq947HW
[1] 2021/01/19 09:27:32.296590 [INF] STREAM: Starting nats-streaming-server[stan] version 0.19.0
[1] 2021/01/19 09:27:15.370591 [ERR] STREAM: [Client:vamp-bff-6d968bd67-vg456] Timed out on heartbeats
[1] 2021/01/19 09:27:06.489317 [INF] STREAM: connection "_NSS-stan-raft" reconnected to NATS Server at "nats://nats:4222"
[1] 2021/01/19 09:27:06.488889 [INF] STREAM: connection "_NSS-stan-acks" reconnected to NATS Server at "nats://nats:4222"
[1] 2021/01/19 09:27:06.487808 [INF] STREAM: connection "_NSS-stan-raft_snap" reconnected to NATS Server at "nats://10.52.5.20:4222"
[1] 2021/01/19 09:27:06.487566 [INF] STREAM: connection "_NSS-stan-acks" reconnected to NATS Server at "nats://10.52.0.18:4222"
[1] 2021/01/19 09:27:06.486766 [INF] STREAM: connection "_NSS-stan-raft_snap" reconnected to NATS Server at "nats://nats:4222"
[1] 2021/01/19 09:27:05.984869 [INF] STREAM: Shutting down.
[1] 2021/01/19 09:23:30.839233 [ERR] STREAM: [Client:vamp-bff-6d968bd67-45lkt] Timed out on heartbeats

jason-magnetic-io commented 3 years ago

This is a larger log snippet including info about which cluster node was the source of each line https://gist.github.com/jason-magnetic-io/096fbca1d797efc7e3680b0d27f76e98

Timestamp 2021/01/23 20:22:47 is when we begin to manually restart the nodes.

kozlovic commented 3 years ago

@jason-magnetic-io Thank you for using NATS Streaming and sorry about your latest troubles.

Just to be clear: is the issue solely about the "timed out due to heartbeats" errors or are you say also that new connections cannot be made?

You mentioned that you do not notice any CPU usage increase that would explain this. Have you seen an increased in the number of clients connecting? Could it be that there is any ulimit on the system that causes new connections to fail?

The "timed out on heartbeats" could be the result of applications being stopped without them having a chance to call the connection "Close()" API. As a refresher: the NATS Streaming server is not a server, but a client to core NATS server. Therefore, clients are not directly TCP connecting to the streaming "server". The streaming server knows that client are "connected" based on heartbeats, and so having client send a close protocol for the server to remove the client is required. Without the close, the server relies on heartbeats and after a configurable number of missed responses, it will consider the client connection lost. See: https://docs.nats.io/nats-streaming-concepts/intro and https://docs.nats.io/nats-streaming-concepts/relation-to-nats for a bit more details.

If new clients cannot connect at that same time, it could be really some networking issue that would explain both type of failures.

Since I see that you are running the server with monitoring port enabled, one thing we could try, when the server is in this situation of failing clients due to HB timeout and rejecting new clients, is to capture the stacks of the leader process. It would be hitting the monitoring endpoint: http://<leader host>:8222/stacksz. Note that this will dump the stack of all the go routines but without stopping the server. However, this is more of a debug practice and there is always a risk that this is causing the server to fail (say because the result is huge and this causes to allocate too much memory, etc..). But since normally when in that situation your course of action is to restart the server (with cleaning the file store), I think it is worth the risk.

jason-magnetic-io commented 3 years ago

Thanks for the http://<leader host>:8222/stacksz debugging tip, that is helpful to know.

To clarify the language, at the time of the issue, there were:

6 clients connected to NATS streaming without any issue. These were long running connections established days earlier.
3 clients connected but were timing out, these clients were able to connect long enough to subscribe to channels but started reporting the connection was lost within 1-2 minutes of subscribing. These correspond to the timeouts seen in the stan log.
3 clients that had failed to connect
When we were actively debugging 3 out of 4 connections failed to connect to NATS streaming despite being able to connect to NATS.

We ruled out it being a network issue.

The 6 clients within the same Kubernetes cluster were unaffected.

Connections from within the same data centre and from a remote data centre were able to connect and authorise with NATS but could not connect with the NATS streaming cluster despite the NATS and NATS streaming Kubernetes Pods sharing the same VMs.

Regarding CPU and memory, the CPU load for each NATS streaming instance was between 0.1 and 0.2 vCPU and the memory usage was flat at 28MB per instance.

We have experienced issues in the past with connections not being cleaned up. All the clients take steps to actively disconnect including intercepting the container shutdown hooks.

We have experience of when the channel, subscription, message and memory limits are exceeded. That was not the behaviour we observed in this case.

kozlovic commented 3 years ago

6 clients connected to NATS streaming without any issue. These were long running connections established days earlier.

3 clients connected but were timing out, these clients were able to connect long enough to subscribe to channels but started reporting the connection was lost within 1-2 minutes of subscribing. These correspond to the timeouts seen in the stan log.

Were those 2 groups of clients located in the same "region" or connecting to the same server?

When we were actively debugging 3 out of 4 connections failed to connect to NATS streaming despite being able to connect to NATS. We ruled out it being a network issue.

Well, it depends. Say you have a NATS core cluster of 3 servers called N1, N2 and N3. Now say that you have a streaming cluster consisting of S1, S2 and S3 which connect to the NATS cluster. They can have connected/reconnected to any NATS server in the cluster, that is, it is not guaranteed that S1 is connected to N1, etc..

When a client connects, it connects to NATS, so it is possible that say it connects to N3, but the Streaming server leader is S1 and for instance this one connects to N1. Your client will connect/authorized fine when connecting to N3 but the streaming connection request has to reach S1 (and be replicated to S2 and S3). So it is not out of the question that you still have network issues even if a client can TCP connect successfully to its close NATS core server.

Connections from within the same data centre and from a remote data centre were able to connect and authorise with NATS but could not connect with the NATS streaming cluster despite the NATS and NATS streaming Kubernetes Pods sharing the same VMs.

See above. I am not saying that there is a network issue, but we can't rule that out.

But it could be that somehow the server is locking up or being deadlocked, but then I would expect leadership to be lost, etc.. so next time the problem occurs, try capturing the /stacksz output, say 2 or 3 at few seconds interval (to see if something is deadlocked).

Thanks!

nats-io / nats-streaming-server

Cluster times out new connections #1150