hyperledger-labs / minifabric

Do fabric network the right and easy way.
Apache License 2.0
304 stars 163 forks source link

help: minifab with k8s #179

Closed itaru2622 closed 3 years ago

itaru2622 commented 3 years ago

I'm trying docs/DeployOntoK8S.md with kubespray as on-premises k8s deployment tool.

in DeployOntoK8S.md says:

Once it is deployed and running, you should get a public IP address which is needed to config your Minifabric spec.yaml file.

I got success in step4 (Prepare Nginx ingress controller) and got many IP addresses as bellow. which address should I use for endpoint_address in spec.yaml? please let me know how to choose / get address for spec.yaml.

$ kubectl wait --namespace ingress-nginx   --for=condition=ready pod   --selector=app.kubernetes.io/component=controller   --timeout=120s
pod/ingress-nginx-controller-7fc74cf778-ts2gk condition met

$ kubectl get pod --namespace ingress-nginx -o wide 
NAME                                        READY   STATUS      RESTARTS   AGE   IP            NODE     NOMINATED NODE   READINESS GATES
ingress-nginx-admission-create-fg456        0/1     Completed   0         64m   10.233.86.2   k8s-n3   <none>           <none>
ingress-nginx-admission-patch-9wks4         0/1     Completed   2        64m   10.233.81.1   k8s-n2   <none>           <none>
ingress-nginx-controller-7fc74cf778-ts2gk   1/1     Running     0          64m   10.233.81.2   k8s-n2   <none>           <none>

$ kubectl get service --namespace ingress-nginx -o wide 
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE   SELECTOR
ingress-nginx-controller             LoadBalancer   10.233.38.205   <pending>     80:31933/TCP,443:32589/TCP   63m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
ingress-nginx-controller-admission   ClusterIP      10.233.38.77    <none>        443/TCP                      63m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

$kubectl get node -o wide
NAME     STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP
k8s-n1   Ready    control-plane,master   96m   v1.20.5   192.168.1.206   <none>
k8s-n2   Ready    <none>                 95m   v1.20.5   192.168.1.207   <none>
k8s-n3   Ready    <none>                 95m   v1.20.5   192.168.1.208   <none>
itaru2622 commented 3 years ago

I found some pod stuck in pending status, as bellow... There is no PV on pure/on-premises k8s in default, but cloud vendor's k8s may have it. non PV causes miss sharing for certificates and other data, then stuck... orz

$ kubectl get pod -n mysite
NAME                        READY   STATUS    RESTARTS   AGE
ca1-org0-example-com-0          1/1     Running   0          3m8s
peer1-org0-example-com-0       0/1    Pending   0   3m8s
orderer1-example-com-0   0/1     Pending   0          3m8s
:

$kubectl describe pod orderer1-example-com-0  -n mysite
:
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  89s   default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling  88s   default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.

$ kubectl get pvc -A
NAMESPACE   NAME                                                     STATUS 
mysite      orderer1-example-com-data-orderer1-example-com-0   Pending
:

$ kubectl get pv -A
No resources found.
litong01 commented 3 years ago

You need to setup PV, without it, minifabric can not allocate persistant volume, thus it will not be able to boot up any node.

litong01 commented 3 years ago

if your ingress controller is not getting you the external IP address, your ingress might not work. That is another issue you need to resolve with your K8S.

itaru2622 commented 3 years ago

thank you for reply. I wil check my k8s and try again...

itaru2622 commented 3 years ago

@litong01 I got some progress, but still failed on "minifab up".

I got "Running" state for each pod on "minifab netup", still got error "- no Raft leader" on "minifab create". It may be caused by GRPCS load balancing in on-premises (baremetal) k8s.

so, I think it's better to reproduce your test-ok case. let me know your case, such as:

$ ./minifab create
Using spec file: /opt/minifabric/spec.yaml
Minifab Execution Context:
    FABRIC_RELEASE=2.2.1
    CHANNEL_NAME=mychannel
    PEER_DATABASE_TYPE=golevel
    CHAINCODE_LANGUAGE=go
    CHAINCODE_NAME=simple
    CHAINCODE_VERSION=1.0
    CHAINCODE_INIT_REQUIRED=true
    CHAINCODE_PARAMETERS="init","a","200","b","300"
    CHAINCODE_PRIVATE=false
    CHAINCODE_POLICY=
    TRANSIENT_DATA=
    BLOCK_NUMBER=newest
    EXPOSE_ENDPOINTS=true
    CURRENT_ORG=org0.example.com
    HOST_ADDRESSES=192.168.1.206,10.233.0.3,10.233.31.116,10.233.55.126,10.233.48.188,10.233.26.178,10.233.0.141,10.233.60.239,10.233.27.111,10.233.71.0,169.254.25.10
    WORKING_DIRECTORY: /opt/minifabric
.......
# Run the channel creation script on cli container ************
  non-zero return code
  2021-04-03 23:28:20.219 UTC [common.tools.configtxgen] main -> INFO 001 Loading configuration
  2021-04-03 23:28:20.243 UTC [common.tools.configtxgen.localconfig] Load -> INFO 002 Loaded configuration: /vars/configtx.yaml
  2021-04-03 23:28:20.243 UTC [common.tools.configtxgen] doOutputChannelCreateTx -> INFO 003 Generating new channel configtx
  2021-04-03 23:28:20.247 UTC [common.tools.configtxgen] doOutputChannelCreateTx -> INFO 004 Writing new channel tx
  2021-04-03 23:28:20.332 UTC [channelCmd] InitCmdFactory -> INFO 001 Endorser and orderer connections initialized
  Error: got unexpected status: SERVICE_UNAVAILABLE -- no Raft leader

# STATS *******************************************************
minifab: ok=34  failed=1

$ kubectl get pod -n mysite
NAME                      READY   STATUS    RESTARTS   AGE
ca-org0-example-com-0     1/1     Running   0          25m
orderer1-example-com-0    1/1     Running   2          25m
orderer2-example-com-0    1/1     Running   2          25m
orderer3-example-com-0    1/1     Running   2          25m
peer-org0-example-com-0   2/2     Running   0          25m
itaru2622 commented 3 years ago

@litong01 after further investigation, I found some reasons for failure in container logs (ingress-enginx-controller and orderers) as bellow.

I guess that it may be failed in applying var/run/allservices.yaml (exposing port mapping), or working with it, according to the following logs. When I applied the above yaml with kubectl in manually, it failed because of yaml invalidation. It says allservices.yaml has no selector nor labels, image in kind:Deployment. but playbooks/ops/certgen/templates/ingresscontroller.yaml has above keys and others in the same section.

When I added selector and labels in allservices.yaml, I succeeded in manual applying. but when I merge it into playbooks/ops/netup/k8stemplates/allservices.j2, I got another error in ingress controller, and orderers. so I gave up self-fixing and further investigation.

Could you check around, please ?

# logs in NGINX Ingress controller container

W0406 13:30:34.194192       6 controller.go:384] Service "mysite/peer-org0-example-com" does not have any active Endpoint for TCP port 7051
W0406 13:30:34.194220       6 controller.go:384] Service "mysite/orderer1-example-com" does not have any active Endpoint for TCP port 7050
W0406 13:30:34.194254       6 controller.go:384] Service "mysite/orderer3-example-com" does not have any active Endpoint for TCP port 7050
I0406 13:30:34.194291       6 controller.go:146] "Configuration changes detected, backend reload required"
I0406 13:30:34.251939       6 controller.go:163] "Backend successfully reloaded"
I0406 13:30:34.252686       6 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-6657875575-5n6wp", UID:"a4065ec4-4b9d-43fa-a8f0-9cf7aebe18ea", APIVersion:"v1", ResourceVersion:"5726", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
W0406 13:30:58.303357       6 controller.go:384] Service "mysite/peer-org0-example-com" does not have any active Endpoint for TCP port 7051
W0406 13:30:58.303415       6 controller.go:384] Service "mysite/orderer3-example-com" does not have any active Endpoint for TCP port 7050
I0406 13:30:58.303459       6 controller.go:146] "Configuration changes detected, backend reload required"
I0406 13:30:58.381489       6 controller.go:163] "Backend successfully reloaded"
I0406 13:30:58.381891       6 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-6657875575-5n6wp", UID:"a4065ec4-4b9d-43fa-a8f0-9cf7aebe18ea", APIVersion:"v1", ResourceVersion:"5726", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
[10.233.81.6] [06/Apr/2021:13:30:58 +0000] TCP 200 0 33 0.001
W0406 13:31:01.636849       6 controller.go:384] Service "mysite/peer-org0-example-com" does not have any active Endpoint for TCP port 7051
I0406 13:31:01.636919       6 controller.go:146] "Configuration changes detected, backend reload required"
I0406 13:31:01.695670       6 controller.go:163] "Backend successfully reloaded"
I0406 13:31:01.695918       6 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-6657875575-5n6wp", UID:"a4065ec4-4b9d-43fa-a8f0-9cf7aebe18ea", APIVersion:"v1", ResourceVersion:"5726", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload trig
# logs in orderer container

2021-04-06 13:30:57.770 UTC [grpc] Infof -> DEBU 496 Subchannel picks a new address "192.168.1.240:7003" to connect
2021-04-06 13:30:57.770 UTC [grpc] UpdateSubConnState -> DEBU 4a6 pickfirstBalancer: HandleSubConnStateChange: 0xc0001dca90, {CONNECTING <nil>}
2021-04-06 13:30:57.771 UTC [grpc] Infof -> DEBU 4aa Channel Connectivity change to CONNECTING
2021-04-06 13:30:57.770 UTC [grpc] Infof -> DEBU 4a7 Subchannel picks a new address "192.168.1.240:7004" to connect
2021-04-06 13:30:57.771 UTC [orderer.consensus.etcdraft] apply -> INFO 4ab Applied config change to add node 1, current nodes in channel: [1 2 3] channel=sy
stemchannel node=1
2021-04-06 13:30:57.771 UTC [orderer.consensus.etcdraft] apply -> INFO 4ac Applied config change to add node 2, current nodes in channel: [1 2 3] channel=sy
stemchannel node=1
2021-04-06 13:30:57.771 UTC [orderer.consensus.etcdraft] apply -> INFO 4ad Applied config change to add node 3, current nodes in channel: [1 2 3] channel=sy
stemchannel node=1
2021-04-06 13:30:58.773 UTC [grpc] Warningf -> DEBU 4af grpc: addrConn.createTransport failed to connect to {192.168.1.240:7003  <nil> 0 <nil>}. Err: connec
tion error: desc = "transport: Error while dialing dial tcp 192.168.1.240:7003: connect: connection refused". Reconnecting...
2021-04-06 13:30:58.773 UTC [grpc] Infof -> DEBU 4b0 Subchannel Connectivity change to TRANSIENT_FAILURE
2021-04-06 13:30:58.773 UTC [grpc] Warningf -> DEBU 4ae grpc: addrConn.createTransport failed to connect to {192.168.1.240:7004  <nil> 0 <nil>}. Err: connec
tion error: desc = "transport: Error while dialing dial tcp 192.168.1.240:7004: connect: connection refused". Reconnecting...
2021-04-06 13:30:58.773 UTC [grpc] Infof -> DEBU 4b1 Subchannel Connectivity change to TRANSIENT_FAILURE
2021-04-06 13:30:58.773 UTC [grpc] UpdateSubConnState -> DEBU 4b2 pickfirstBalancer: HandleSubConnStateChange: 0xc0001dca90, {TRANSIENT_FAILURE connection e
rror: desc = "transport: Error while dialing dial tcp 192.168.1.240:7004: connect: connection refused"}
2021-04-06 13:30:58.773 UTC [grpc] Infof -> DEBU 4b3 Channel Connectivity change to TRANSIENT_FAILURE
# logs in manual applying allservices.yaml, which generated by minifabric.
kubectl apply -f minifabric/vars/run/allservices.yaml 
configmap/tcp-services created
error: error validating "minifabric/vars/run/allservices.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false

kubectl apply -f minifabric/vars/run/allservices.yaml  --validate=false
configmap/tcp-services unchanged
service/ingress-nginx-controller created
The Deployment "ingress-nginx-controller" is invalid: 
* spec.selector: Required value
* spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels`
* spec.template.spec.containers[0].image: Required value
itaru2622 commented 3 years ago

@litong01 I tested with single orderer (solo), then situation is changed, I got another progress but still failed.

orderer could start as solo leader, and ready in systemchannel, but it failed in channel creation since cli container couldn't communicate with others. It may be packet blocked by ipvs or iptables rules which added by ingress-controller (not sure).

By the way, why cli container is running as pure docker container but not k8s pod/service?

litong01 commented 3 years ago

cli is a client which in theory can run anywhere to communicate with fabric network. You probably did not have ingress setup correctly so that your cli can not communicate with the network.

itaru2622 commented 3 years ago

@litong01 thank you for your reply.

I tested following three k8s systems but still failed setting up fabric on k8s with minifabric. Please let me know your test-ok case, how did you setup k8s environment such as:

my tests: a) https://github.com/kubernetes-sigs/kubespray as describing aboves, I had some trouble around LoadBalancer, when I tested with calico + metalLB + ingress controller.

b) Amazon EKS by setting up with official eksctl minifabric playbooks may be not compatible with EKS. it failed in first kube operation (cannot create namescape etc), it may be caused that EKS requires amazon original kubectl.

c) https://github.com/kubernetes-sigs/kind/ I tested with kind + metalLB + ingress controller. It passed tests described in https://kind.sigs.k8s.io/docs/user/loadbalancer/ and https://kind.sigs.k8s.io/docs/user/ingress/ but minifabric failed setting up fabric on k8s, in the raft leader election and others, maybe caused by ingress-controller issue.

litong01 commented 3 years ago

this has been tested on docker desktop kubernetes and GKE. Have not used kind to test this, nor EKS. Before you try this against any other k8s env, it is important to make sure that you use vendor client apps such as gcloud, ibmcloud, or aws client to setup kubeconfig files and credentials, then you can verify with command kubectl get nodes, if you can not do that, that means your env is not ready to do minifabric deployment onto the target k8s system. You have to make sure that you can do that and the command returns expected results. something like this.

ubuntu@u2004:~$ kubectl get nodes
NAME                                      STATUS   ROLES    AGE    VERSION
gke-tongedge-default-pool-ed6ee5f7-glz2   Ready    <none>   112m   v1.19.9-gke.100
gke-tongedge-default-pool-ed6ee5f7-xknl   Ready    <none>   112m   v1.19.9-gke.100
gke-tongedge-default-pool-ed6ee5f7-zsj5   Ready    <none>   112m   v1.19.9-gke.100

Also, it requires persistent volume and ingress controller should provide you an external accessible IP address which must be used in the spec.yaml file and kubeconfig and certificate files must be in the right place.

itaru2622 commented 3 years ago

@litong01 thank you for your reply.

I'm re-playing your test-ok cases. I'd like to confirm URL for nginx ingress-controller deployment which used by your cases. Did you used the same URL in both cases, as described in DeployOntoK8S.md ? if not, please let me know which URL you used...

when I tested with "docker desktop for windows" with built-in kubernetes and built-in loadbalancer, minifab failed setting up fabric on k8s because built-in loadbalancer assigned 'localhost' as extanal-ip for ingress controller.

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44.0/deploy/static/provider/cloud/deploy.yaml

kubectl get svc -A
NAMESPACE       NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx   ingress-nginx-controller             LoadBalancer   10.96.21.110     localhost     80:30019/TCP,443:32635/TCP   41m
ingress-nginx   ingress-nginx-controller-admission   ClusterIP      10.104.189.171   <none>        443/TCP                      41m
kube-system     kube-dns                             ClusterIP      10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP       78m
default         kubernetes                           ClusterIP      10.96.0.1        <none>        443/TCP                      78m

kubectl get node -A -o wide
NAME             STATUS   ROLES    AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                   CONTAINER-RUNTIME
docker-desktop   Ready    master   140m   v1.19.7   192.168.65.4   <none>        Docker Desktop   5.4.72-microsoft-standard-WSL2   docker://20.10.5

docker -v
Docker version 20.10.5, build 55c4c88
litong01 commented 3 years ago

you should be able to use the latest ingress controller deployment

itaru2622 commented 3 years ago

It failed by the same reason, even latest ingress controller deployment , as bellow:

# test with docker desktop built-in kubernetes on windows

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.45.0/deploy/static/provider/cloud/deploy.yaml

kubectl get svc -A
NAMESPACE       NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx   ingress-nginx-controller             LoadBalancer   10.98.9.219     localhost     80:31128/TCP,443:32616/TCP   5s
ingress-nginx   ingress-nginx-controller-admission   ClusterIP      10.109.248.99   <none>        443/TCP                      5s
kube-system     kube-dns                             ClusterIP      10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP       16h
default         kubernetes                           ClusterIP      10.96.0.1       <none>        443/TCP                      16h
itaru2622 commented 3 years ago

I got progress for setting up HLF on Google managed k8s(GKE) with PEER_DATABASE_TYPE=golevel

test ok: solo orderer, with two peer organizations failed: multiple orderers case, error message said 'no Raft leader', but reason unknown... failed: PEER_DATABASE_TYPE=couchdb even solo orderer (no couchdb instance created, maybe running with golevel)

It requires few modifications for minifabric container, for my test-ok case:

FROM line in Dockerfile: pure alpine => google/gcloud-sdk:alpine docker run parameter in minifab: passing CLOUDSDK_CONFIG and credentials, as described in https://hub.docker.com/r/google/cloud-sdk


I also tested minifabric with azure AKS, and result was the same. I could setup solo orderer with multiple peer organization with PEER_DATABASE_TYPE=golevel, without any modification for minifabric container, but failed the other cases.

litong01 commented 3 years ago

@itaru2622 should this be still open? seems that you have made a lot of progress here and also with your PR, seems this needs to be closed.

itaru2622 commented 3 years ago

@litong01 Yes, I will close this issue for now, since lots of issues described the above were fixed.

I still can not setup HLF on baremetal(on-premises) k8s. it may be caused by k8s loadbalancer issue but not minifabric issue, since minifabric with cloud k8s works fine.

jiahuigeng commented 3 years ago

我正在尝试使用 kubespray 作为本地 k8s 部署工具的 docs/DeployOntoK8S.md。

在 DeployOntoK8S.md 中说:

部署并运行后,您应该获得一个公共 IP 地址,这是配置 Minifabric spec.yaml 文件所需的。

我在第 4 步(准备 Nginx 入口控制器)中取得了成功,并获得了许多 IP 地址,如下所示。 我应该为 spec.yaml 中的 endpoint_address 使用哪个地址? 请让我知道如何为 spec.yaml 选择/获取地址。

$ kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=120s
pod/ingress-nginx-controller-7fc74cf778-ts2gk 条件满足

$ kubectl get pod --namespace ingress-nginx -o wide 
NAME READY STATUS RESTARTS AGE IP 节点 指定节点 READINESS GATES
ingress-nginx-admission-create-fg456 0/1 Completed 0 64m 10.233.86.2 k8s-n3    < none >            < none > 
ingress-nginx-admission-patch-9wks4 0/1 Completed 2 64m 10.233.81.1 k8s-n2    < none >            <无> 
ingress-nginx-controller-7fc74cf778-ts2gk 1/1 运行 0 64m 10.233.81.2 k8s-n2    <无>            <无>

$ kubectl get service --namespace ingress-nginx -o wide 
名称 类型 CLUSTER-IP EXTERNAL-IP PORT(S) Age SELECTOR
ingress-nginx-controller LoadBalancer 10.233.38.205    < pending >      80:31933/TCP,443:32589/TCP 63m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes。 io/name=ingress-nginx
ingress-nginx-controller-admission ClusterIP 10.233.38.77     < none >         443/TCP 63m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress- nginx

$kubectl get node -o wide
姓名 状态 角色 年龄 版本 内部 IP 外部 IP
k8s-n1 Ready control-plane,master 96m v1.20.5 192.168.1.206    < none > 
k8s-n2 Ready     < none >                  95m v1.20.5 192.168.1.207    < none > 
k8s-n3 Ready     < none >                  95m v1.20.20.2    <无>

Hi, what's the correct ip address? I have the same question

raza-sikander commented 3 years ago

Update i am trying in personal latptop so i used metallb to get external endpoint ip. Now after setting the external end point ip in the spec.yaml and running it im getting this error while minifabric up -e true Ofhsz7AILs

Hello im kinda struck over here my ingress is not getting ip assigned its in the pending state. logs1

logs2

im using kubernetes version 1.18.0, and ingress version 0.44 with weave network

itaru2622 commented 3 years ago

@raza-sikander I think something wrong in your metallb. alternative load balancer "purelb" is much easier to use. https://purelb.gitlab.io/docs/install/

please make sure you have dynamic volume provisioner in your kubernetes.

raza-sikander commented 3 years ago

@raza-sikander I think something wrong in your metallb. alternative load balancer "purelb" is much easier to use. https://purelb.gitlab.io/docs/install/

please make sure you have dynamic volume provisioner in your kubernetes.

Thank you i had the load balancer issue. sorted it and it working.