hyperledger / fabric

Hyperledger Fabric is an enterprise-grade permissioned distributed ledger framework for developing solutions and applications. Its modular and versatile design satisfies a broad range of industry use cases. It offers a unique approach to consensus that enables performance at scale while preserving privacy.
https://wiki.hyperledger.org/display/fabric
Apache License 2.0
15.65k stars 8.81k forks source link

2.4.6: Issue with RAFT Orderer certificates: "certificate is valid for orderer1, not orderer0" #3678

Open isrand opened 1 year ago

isrand commented 1 year ago

Hello all,

We are currently trying to run a network based on v2.4.6 using three RAFT ordering nodes. We have modified the crypto-config.yaml to generate the certificates for the three orderers:

OrdererOrgs:
  - Name: orderer-org
    Domain: example.com
    EnableNodeOUs: true
    Specs:
      - Hostname: orderer0
        SANS:
          - orderer0-orderer-org
      - Hostname: orderer1
        SANS:
          - orderer1-orderer-org
      - Hostname: orderer2
        SANS:
          - orderer2-orderer-org

And updated the configtx.yaml file to set the orderers as the etcdraft consenters:

EtcdRaft:
        Consenters:
        - Host: orderer0-orderer-org
          Port: 7050
          ClientTLSCert: crypto-config/ordererOrganizations/orderer-org.example.com/orderers/orderer0.orderer-org.example.com/tls/server.crt
          ServerTLSCert: crypto-config/ordererOrganizations/orderer-org.example.com/orderers/orderer0.orderer-org.example.com/tls/server.crt
        - Host: orderer1-orderer-org
          Port: 7050
          ClientTLSCert: crypto-config/ordererOrganizations/orderer-org.example.com/orderers/orderer1.orderer-org.example.com/tls/server.crt
          ServerTLSCert: crypto-config/ordererOrganizations/orderer-org.example.com/orderers/orderer1.orderer-org.example.com/tls/server.crt
        - Host: orderer2-orderer-org
          Port: 7050
          ClientTLSCert: crypto-config/ordererOrganizations/orderer-org.example.com/orderers/orderer2.orderer-org.example.com/tls/server.crt
          ServerTLSCert: crypto-config/ordererOrganizations/orderer-org.example.com/orderers/orderer2.orderer-org.example.com/tls/server.crt

These certificates then get mounted on the Orderer container via a Kubernetes secret:

{{- range $index, $element := until (int .Values.raft.numberOfConsenters) }}
{{ $ordererWithIndex := print "orderer" $index }}
{{ $ordererName := print $ordererWithIndex "-" $.Values.global.organization.name }}
{{ $pathToOrganizationArtifacts := print "artifacts/" $.Values.global.organization.name "/" $ordererName }}
{{ $pathToMSPConfigurationYaml := print $pathToOrganizationArtifacts "/msp/config.yaml" }}
{{ $pathToOrdererCertificate := print $pathToOrganizationArtifacts "/msp/signcerts/cert.pem" }}
{{ $pathToOrdererPrivateKey := print $pathToOrganizationArtifacts "/msp/keystore/key.pem" }}
{{ $pathToCACertificate := print $pathToOrganizationArtifacts "/msp/cacerts/cert.pem" }}
{{ $pathToTLSCACertificate := print $pathToOrganizationArtifacts "/msp/tlscacerts/cert.pem" }}
{{ $pathToOrdererTLSCACertificate := print $pathToOrganizationArtifacts "/tls/ca.crt" }}
{{ $pathToOrdererTLSCertificate := print $pathToOrganizationArtifacts "/tls/server.crt" }}
{{ $pathToOrdererTLSPrivateKey := print $pathToOrganizationArtifacts "/tls/server.key" }}
---
apiVersion: v1
kind: Secret
metadata:
  name: {{ $ordererName }}-crypto-config
  labels:
    {{- include "orderer.labels" $ | nindent 4 }}
  annotations:
    "helm.sh/hook": pre-install, pre-upgrade
    "helm.sh/hook-weight": "1"
    "helm.sh/hook-delete-policy": "before-hook-creation"
type: Opaque
stringData:
  config: |
    {{ $.Files.Get $pathToMSPConfigurationYaml | nindent 4 }}
  orderer-cert: |
    {{ $.Files.Get $pathToOrdererCertificate | nindent 4 }}
  orderer-key: |
    {{ $.Files.Get $pathToOrdererPrivateKey | nindent 4 }}
  ca-cert: |
    {{ $.Files.Get $pathToCACertificate | nindent 4 }}
  tlsca-cert: |
    {{ $.Files.Get $pathToTLSCACertificate | nindent 4 }}
  ca-tls-cert: |
    {{ $.Files.Get $pathToOrdererTLSCACertificate | nindent 4 }}
  orderer-tls-cert: |
    {{ $.Files.Get $pathToOrdererTLSCertificate | nindent 4 }}
  orderer-tls-key: |
    {{ $.Files.Get $pathToOrdererTLSPrivateKey | nindent 4 }}
{{- end }}

And everything gets deployed using HELM: the three Orderers are just one release, we just iterate over the manifests as many times as we have set up the number of RAFT consenters and deploy the separate components. They get deployed at the same time (or shortly after one another, sequentially), and when a channel gets created using peer channel create ... they start communicating with each other to reach consensus on the RAFT leader. After this we start seeing intermitent errors in each Orderer container regarding TLS Handshake failures (Both ClientHandshake and ServerHandshake):

2022-10-07 11:09:21.015 UTC 0004 ERRO [comm.tls] ClientHandshake -> Client TLS handshake failed after 1.547888ms with error: x509: certificate is valid for orderer1.orderer-org.example.com, orderer1, orderer1-orderer-org, not orderer0-orderer-org remoteaddress=10.99.170.143:7050

This is not a fatal error since eventually the Orderers end up syncing and working.

Later, when the rest of the network gets deployed (peers, peer CLI, etc.) they communicate with the Orderer to perform admin operations (create channel, set anchor peers, ...), since we communicate with the Orderers using tls we provide the correct --cafile to the CLI commands, but the error appears again a couple of times before working.

This error is not a dealbreaker since the network works as expected, eventually, but it makes us wonder if we are misconfiguring something somewhere. We double checked the secrets that get mounted on the Orderer containers and ensured that the certificates are indeed correct for each and everyone of them.

We would really appreciate some help on the matter! Please do reach out if you need more information :)

Kind regards, Isra (isrand) Nebot Dominguez

davidkel commented 1 year ago

Have you tried running the same configuration outside of K8s to prove what you have generated is correct ? K8s networking can be complex especially around the use of TLS. It's highly recommended that you use one of the fabric operators to deploy hyperledger fabric into K8s rather than attempt to try it yourself, for example fabric-operator or hlf-operator that can be found in hyperledger-labs. There is also a workshop done recently which included scenarios around K8s deployment. The workshop can be found at https://github.com/hyperledgendary/full-stack-asset-transfer-guide and uses fabric-operator

isrand commented 1 year ago

Hey @davidkel, thanks for your response!

Unfortunately due to project circumstances we can't use either of the operators and we need to deploy the HLF components ourselves. To add more info to the matter, we are using Minikube v1.27.0. I tried deploying the exact same network configuration on Kubernetes v1.21.4 and the same error occurs.

We used to deploy the three Orderers separately —one release per Orderer— and I believe this error was nowhere to be found, only when we deployed them as one release these issues started popping up.

Also worth noting we are not using the osnadmin CLI tool to create a channel and join the Orderers to it. The way we do it right now is:

Do you reckon there could be a mistake in the way we deploy our components / bootstrap the network and channels? I am starting to think so.

davidkel commented 1 year ago

@isrand if you aren't using osnadmin are you still using a system channel then ? Anyway, I think you should validate your approach outside of K8s first, that will show if your approach and configuration files are correct and remove the K8s aspect completely because K8s and getting the various network configuration correct (such as ingresses) is complicated. From there any issues deploying to K8s will mean it's how you are setting up your K8s env. The only reference fabric has to trying to deploy hyperledger fabric into a K8s environment is test-network-k8s in fabric-samples, but this is not a recommended pattern to follow.

The error you describe though implies that the wrong TLS cert is being presented as the SANS in a cert will provide the hostname of where it should come from but it didn't match the host that presented that certificate.

You might also ask on discord to see if anyone there has deployed fabric using helm charts, it's not something we have any example for, or have invested any time in.