hyperledger-bevel / bevel-operator-fabric

Hyperledger Fabric Kubernetes operator - Hyperledger Fabric operator for Kubernetes (v2.3, v2.4 and v2.5, soon 3.0)
https://hyperledger-bevel.github.io/bevel-operator-fabric/
Apache License 2.0
280 stars 93 forks source link

Close http connections before making api calls to avoid "Unexpected EOF golang http client error" #141

Open rohitrj22 opened 1 year ago

rohitrj22 commented 1 year ago

What happened?

The istio hosts (for peers,cas,orderers) worked fine when the hlf-network was created initially. All the http api calls and grpc calls worked fine.But few hours later, all the pods got scheduled onto a new node and restarted resulting in the connection being closed . The net.http code in the operator (probably in client.go and httpclient.go) assumes that the connection is still open and so the next request that tries to use the connection next, encounters the EOF from the connection being closed the other time. I faced the following issues due to this

While registering peers to ca:

Screenshot from 2023-02-01 14-34-53

All the fabricfollower channels custom resources failed for the following reason:

image

All these isito hostnames are valid and are working .

Also tried to hit the peer grpc url with grpcurl

image

What did you expect to happen?

Expected to see successful api requests even after the pods being restarted and scheduling onto a new node. Expect the httpclient to use new connection(close the request) before making api calls.

How can we reproduce it (as minimally and precisely as possible)?

Pod restarting or scheduling onto a new node or whatever it is required to close the http or grpc connection.

Anything else we need to know?

If not closing the connection is really the issue , then this fix might possibly be the solution

https://bugz.pythonanywhere.com/golang/Unexpected-EOF-golang-http-client-error

have to add req.Close = true before doing httpClient.Do(req).

Kubernetes version

```console # kubectl get nodes -o wide ```
dviejokfs commented 1 year ago

Hi @rohitrj22

That means that the hlf-operator pod cannot access Peers/CAs.

Try to execute a curl command from the hlf-operator to debug connectivity issues, to check if it's a hlf-operator problem or a connectivity problem.

rohitrj22 commented 1 year ago

Hey @dviejokfs , thanks for replying . I have exec into the hlf-operator pod and executed curl and ran into this error

image

dviejokfs commented 1 year ago

Hey @dviejokfs , thanks for replying . I have exec into the hlf-operator pod and executed curl and ran into this error

image

@rohitrj22 Then definitely, it seems that the problem is in the connectivity from the Kubernetes cluster to the CAs.

You need to check with your team why you could reach the CA before and now you can't.

Maybe DNS domain names changed, IPs, etc

rohitrj22 commented 1 year ago

Then definitely, it seems that the problem is in the connectivity from the Kubernetes cluster to the CAs.

You need to check with your team why you could reach the CA before and now you can't.

Maybe DNS domain names changed, IPs, etc

Hey, @dviejokfs . Thank you for replying again. The pods might have been scheduled onto a new node which may result in change in node ip. But then, we are using istio with valid hosts. So, change of IP's should not be the issue as the istio host is mapped to the respective peer/ca k8s service. And btw all our services are ClusterIP , which is provided by default by hlf-operator itself