hyperledger-labs / fabric-operator

Hyperledger Fabric Kubernetes Operator
Apache License 2.0
66 stars 37 forks source link

Migrated instances causes indefinite restarts on grpcweb proxy containers of peers/orderers #154

Closed s7santosh closed 9 months ago

s7santosh commented 9 months ago

Over the SaaS to HLF Support migrated instance, once we upgrade the Peer or Orderer to the latest Fabric release the proxy container keeps on restarting:

time="2023-11-21T16:15:29Z" level=info msg="using websockets"
time="2023-11-21T16:15:29Z" level=info msg="[core] Subchannel picks a new address \"127.0.0.1:7050\" to connect" system=system
time="2023-11-21T16:15:29Z" level=info msg="listening for http_tls on: [::]:7443"
time="2023-11-21T16:15:29Z" level=info msg="listening for http on: [::]:8080"
time="2023-11-21T16:15:29Z" level=info msg="[core] Channel Connectivity change to CONNECTING" system=system
time="2023-11-21T16:15:49Z" level=warning msg="[core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:7050 127.0.0.1:7050 <nil> 0 <nil>}. Err: connection error: desc = \"transport: authentication handshake failed: context deadline exceeded\". Reconnecting..." system=system
time="2023-11-21T16:15:49Z" level=info msg="[core] Subchannel Connectivity change to TRANSIENT_FAILURE" system=system
time="2023-11-21T16:15:49Z" level=info msg="[core] Channel Connectivity change to TRANSIENT_FAILURE" system=system
time="2023-11-21T16:15:50Z" level=info msg="[core] Subchannel Connectivity change to CONNECTING" system=system
time="2023-11-21T16:15:50Z" level=info msg="[core] Subchannel picks a new address \"127.0.0.1:7050\" to connect" system=system
time="2023-11-21T16:15:50Z" level=info msg="[core] Channel Connectivity change to CONNECTING" system=system
time="2023-11-21T16:16:10Z" level=warning msg="[core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:7050 127.0.0.1:7050 <nil> 0 <nil>}. Err: connection error: desc = \"transport: authentication handshake failed: context deadline exceeded\". Reconnecting..." system=system
time="2023-11-21T16:16:10Z" level=info msg="[core] Subchannel Connectivity change to TRANSIENT_FAILURE" system=system
time="2023-11-21T16:16:10Z" level=info msg="[core] Channel Connectivity change to TRANSIENT_FAILURE" system=system
time="2023-11-21T16:16:12Z" level=info msg="[core] Subchannel Connectivity change to CONNECTING" system=system
time="2023-11-21T16:16:12Z" level=info msg="[core] Subchannel picks a new address \"127.0.0.1:7050\" to connect" system=system
time="2023-11-21T16:16:12Z" level=info msg="[core] Channel Connectivity change to CONNECTING" system=system

The Pod Description:

kubectl describe pod orderer1node1-7d877c6f5b-82rst -n nfb7685
Name:         orderer1node1-7d877c6f5b-82rst
Namespace:    nfb7685
Priority:     0
Node:         10.192.249.105/10.192.249.105
Start Time:   Tue, 21 Nov 2023 21:44:55 +0530
Labels:       app=orderer1node1
              app.kubernetes.io/instance=ibm-hlfsupportorderer
              app.kubernetes.io/managed-by=ibm-hlfsupport-operator
              app.kubernetes.io/name=ibm-hlfsupport
              creator=ibm-hlfsupport
              helm.sh/chart=ibm-ibp
              orderingnode=node1
              orderingservice=orderer1
              parent=orderer1
              pod-template-hash=7d877c6f5b
              release=operator
Annotations:  cni.projectcalico.org/containerID: 316c43ac8ca7ea912528e6898ae7a0db2cb33f0f9e258c95f536752a703e0f64
              cni.projectcalico.org/podIP: 172.30.189.114/32
              cni.projectcalico.org/podIPs: 172.30.189.114/32
              kubectl.kubernetes.io/restartedAt: 2023-04-24T15:48:24Z
              productChargedContainers: orderer
              productID: 5d5997a033594f149a534a09802d60f1
              productMetric: VIRTUAL_PROCESSOR_CORE
              productName: IBM Support for Hyperledger Fabric
              productVersion: 1.0.0
Status:       Running
IP:           172.30.189.114
IPs:
  IP:           172.30.189.114
Controlled By:  ReplicaSet/orderer1node1-7d877c6f5b
Init Containers:
  init:
    Container ID:  containerd://66a102d93c952a7efa2590da9d26a456ecc66ba79aef780413f856f8cd66e62f
    Image:         icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-init:1.0.7-20231114-amd64
    Image ID:      icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-init@sha256:01e0a063dd0d6567d2b1b331ec8e98ac064f76776ebbc54b4e9ba0f37e10bbdc
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      chmod -R 775 /ordererdata/ && chown -R -H 7051:7051 /ordererdata/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 21 Nov 2023 21:44:57 +0530
      Finished:     Tue, 21 Nov 2023 21:45:16 +0530
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  200M
    Requests:
      cpu:     100m
      memory:  200M
    Environment:
      LICENSE:  accept
    Mounts:
      /ordererdata from orderer-data (rw,path="data")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rr6w5 (ro)
Containers:
  orderer:
    Container ID:   containerd://77c4b903c2ef51251508120f5fc81c19f18e20113f7e8c982830b636a27633ef
    Image:          icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-orderer:2.5.5-20231114-amd64
    Image ID:       icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-orderer@sha256:e76c241db6876bf7173d103235d342bc5a311e931a6007b5129ad0e1e49833fd
    Ports:          7050/TCP, 8443/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 21 Nov 2023 21:45:17 +0530
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  8096M
    Requests:
      cpu:      2
      memory:   8096M
    Liveness:   http-get https://:operations/healthz delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  http-get https://:operations/healthz delay=26s timeout=1s period=5s #success=1 #failure=3
    Environment Variables from:
      orderer1node1-env  ConfigMap  Optional: false
    Environment:
      LICENSE:                                      accept
      FABRIC_CFG_PATH:                              /certs/
      ORDERER_GENERAL_KEEPALIVE_SERVERMININTERVAL:  25s
    Mounts:
      /certs from orderer-config (rw)
      /certs/genesis from orderer-genesis (rw)
      /certs/msp from orderer-config (rw)
      /certs/msp/admincerts from ecert-admincerts (rw)
      /certs/msp/cacerts from ecert-cacerts (rw)
      /certs/msp/keystore from ecert-keystore (rw)
      /certs/msp/signcerts from ecert-signcert (rw)
      /certs/msp/tlscacerts from tls-cacerts (rw)
      /certs/tls/keystore from tls-keystore (rw)
      /certs/tls/signcerts from tls-signcert (rw)
      /ordererdata from orderer-data (rw,path="data")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rr6w5 (ro)
  proxy:
    Container ID:   containerd://1e761d2089b4968ecc77251d91e6e5e6d995d25cf5a01f542b66ccd9c999973a
    Image:          icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-grpcweb:1.0.6-20231010-amd64
    Image ID:       icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-grpcweb@sha256:c4da4e78dba8512bcb442d84b7e1176ef8a95f39b860271a5212a0f6007600a8
    Ports:          8080/TCP, 7443/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 21 Nov 2023 21:54:27 +0530
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 21 Nov 2023 21:52:57 +0530
      Finished:     Tue, 21 Nov 2023 21:54:26 +0530
    Ready:          False
    Restart Count:  6
    Limits:
      cpu:     100m
      memory:  200M
    Requests:
      cpu:      100m
      memory:   200M
    Liveness:   http-get https://:https/settings delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  http-get https://:https/settings delay=26s timeout=5s period=5s #success=1 #failure=3
    Environment:
      LICENSE:                        accept
      BACKEND_ADDRESS:                127.0.0.1:7050
      SERVER_TLS_CERT_FILE:           /certs/tls/signcerts/cert.pem
      SERVER_TLS_KEY_FILE:            /certs/tls/keystore/key.pem
      SERVER_TLS_CLIENT_CA_FILES:     /certs/msp/tlscacerts/cacert-0.pem
      SERVER_BIND_ADDRESS:            0.0.0.0
      SERVER_HTTP_DEBUG_PORT:         8080
      SERVER_HTTP_TLS_PORT:           7443
      BACKEND_TLS:                    true
      SERVER_HTTP_MAX_WRITE_TIMEOUT:  5m
      SERVER_HTTP_MAX_READ_TIMEOUT:   5m
      USE_WEBSOCKETS:                 true
      EXTERNAL_ADDRESS:               nfb7685-orderer1node1.iks-cluster-mos-875090-1f48c9b2f691f3800a79dbff4ff72526-0000.jp-tok.containers.appdomain.cloud:7050
    Mounts:
      /certs/msp/tlscacerts from tls-cacerts (rw)
      /certs/tls/keystore from tls-keystore (rw)
      /certs/tls/signcerts from tls-signcert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rr6w5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  orderer-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  orderer1node1-pvc
    ReadOnly:   false
  ecert-admincerts:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ecert-orderer1node1-admincerts
    Optional:    false
  ecert-cacerts:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ecert-orderer1node1-cacerts
    Optional:    false
  ecert-keystore:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ecert-orderer1node1-keystore
    Optional:    false
  ecert-signcert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ecert-orderer1node1-signcert
    Optional:    false
  tls-cacerts:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  tls-orderer1node1-cacerts
    Optional:    false
  tls-keystore:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  tls-orderer1node1-keystore
    Optional:    false
  tls-signcert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  tls-orderer1node1-signcert
    Optional:    false
  orderer-genesis:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  orderer1node1-genesis
    Optional:    false
  orderer-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      orderer1node1-config
    Optional:  false
  kube-api-access-rr6w5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 600s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 600s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  10m                   default-scheduler  Successfully assigned nfb7685/orderer1node1-7d877c6f5b-82rst to 10.192.249.105
  Normal   Pulling    10m                   kubelet            Pulling image "icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-init:1.0.7-20231114-amd64"
  Normal   Created    10m                   kubelet            Created container init
  Normal   Started    10m                   kubelet            Started container init
  Normal   Pulled     10m                   kubelet            Successfully pulled image "icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-init:1.0.7-20231114-amd64" in 911.153797ms (911.184605ms including waiting)
  Normal   Pulling    9m54s                 kubelet            Pulling image "icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-orderer:2.5.5-20231114-amd64"
  Normal   Pulling    9m53s                 kubelet            Pulling image "icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-grpcweb:1.0.6-20231010-amd64"
  Normal   Pulled     9m53s                 kubelet            Successfully pulled image "icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-orderer:2.5.5-20231114-amd64" in 854.257649ms (854.457747ms including waiting)
  Normal   Created    9m53s                 kubelet            Created container orderer
  Normal   Started    9m53s                 kubelet            Started container orderer
  Normal   Pulled     9m42s                 kubelet            Successfully pulled image "icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-grpcweb:1.0.6-20231010-amd64" in 11.169973854s (11.169997242s including waiting)
  Normal   Created    9m42s                 kubelet            Created container proxy
  Normal   Started    9m41s                 kubelet            Started container proxy
  Warning  Unhealthy  8m34s (x4 over 9m4s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Pulled     5m13s                 kubelet            Successfully pulled image "icr.io/cpopen/ibm-hlfsupport/ibm-hlfsupport-grpcweb:1.0.6-20231010-amd64" in 782.526639ms (782.549764ms including waiting)
  Warning  Unhealthy  9s (x86 over 9m14s)   kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500
asararatnakar commented 9 months ago

Thank you @s7santosh . We added a fix for this issue.