Closed itaru2622 closed 3 years ago
I found some pod stuck in pending status, as bellow... There is no PV on pure/on-premises k8s in default, but cloud vendor's k8s may have it. non PV causes miss sharing for certificates and other data, then stuck... orz
$ kubectl get pod -n mysite
NAME READY STATUS RESTARTS AGE
ca1-org0-example-com-0 1/1 Running 0 3m8s
peer1-org0-example-com-0 0/1 Pending 0 3m8s
orderer1-example-com-0 0/1 Pending 0 3m8s
:
$kubectl describe pod orderer1-example-com-0 -n mysite
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 89s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 88s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
$ kubectl get pvc -A
NAMESPACE NAME STATUS
mysite orderer1-example-com-data-orderer1-example-com-0 Pending
:
$ kubectl get pv -A
No resources found.
You need to setup PV, without it, minifabric can not allocate persistant volume, thus it will not be able to boot up any node.
if your ingress controller is not getting you the external IP address, your ingress might not work. That is another issue you need to resolve with your K8S.
thank you for reply. I wil check my k8s and try again...
@litong01 I got some progress, but still failed on "minifab up".
I got "Running" state for each pod on "minifab netup", still got error "- no Raft leader" on "minifab create". It may be caused by GRPCS load balancing in on-premises (baremetal) k8s.
so, I think it's better to reproduce your test-ok case. let me know your case, such as:
$ ./minifab create
Using spec file: /opt/minifabric/spec.yaml
Minifab Execution Context:
FABRIC_RELEASE=2.2.1
CHANNEL_NAME=mychannel
PEER_DATABASE_TYPE=golevel
CHAINCODE_LANGUAGE=go
CHAINCODE_NAME=simple
CHAINCODE_VERSION=1.0
CHAINCODE_INIT_REQUIRED=true
CHAINCODE_PARAMETERS="init","a","200","b","300"
CHAINCODE_PRIVATE=false
CHAINCODE_POLICY=
TRANSIENT_DATA=
BLOCK_NUMBER=newest
EXPOSE_ENDPOINTS=true
CURRENT_ORG=org0.example.com
HOST_ADDRESSES=192.168.1.206,10.233.0.3,10.233.31.116,10.233.55.126,10.233.48.188,10.233.26.178,10.233.0.141,10.233.60.239,10.233.27.111,10.233.71.0,169.254.25.10
WORKING_DIRECTORY: /opt/minifabric
.......
# Run the channel creation script on cli container ************
non-zero return code
2021-04-03 23:28:20.219 UTC [common.tools.configtxgen] main -> INFO 001 Loading configuration
2021-04-03 23:28:20.243 UTC [common.tools.configtxgen.localconfig] Load -> INFO 002 Loaded configuration: /vars/configtx.yaml
2021-04-03 23:28:20.243 UTC [common.tools.configtxgen] doOutputChannelCreateTx -> INFO 003 Generating new channel configtx
2021-04-03 23:28:20.247 UTC [common.tools.configtxgen] doOutputChannelCreateTx -> INFO 004 Writing new channel tx
2021-04-03 23:28:20.332 UTC [channelCmd] InitCmdFactory -> INFO 001 Endorser and orderer connections initialized
Error: got unexpected status: SERVICE_UNAVAILABLE -- no Raft leader
# STATS *******************************************************
minifab: ok=34 failed=1
$ kubectl get pod -n mysite
NAME READY STATUS RESTARTS AGE
ca-org0-example-com-0 1/1 Running 0 25m
orderer1-example-com-0 1/1 Running 2 25m
orderer2-example-com-0 1/1 Running 2 25m
orderer3-example-com-0 1/1 Running 2 25m
peer-org0-example-com-0 2/2 Running 0 25m
@litong01 after further investigation, I found some reasons for failure in container logs (ingress-enginx-controller and orderers) as bellow.
I guess that it may be failed in applying var/run/allservices.yaml (exposing port mapping), or working with it, according to the following logs. When I applied the above yaml with kubectl in manually, it failed because of yaml invalidation. It says allservices.yaml has no selector nor labels, image in kind:Deployment. but playbooks/ops/certgen/templates/ingresscontroller.yaml has above keys and others in the same section.
When I added selector and labels in allservices.yaml, I succeeded in manual applying. but when I merge it into playbooks/ops/netup/k8stemplates/allservices.j2, I got another error in ingress controller, and orderers. so I gave up self-fixing and further investigation.
Could you check around, please ?
# logs in NGINX Ingress controller container
W0406 13:30:34.194192 6 controller.go:384] Service "mysite/peer-org0-example-com" does not have any active Endpoint for TCP port 7051
W0406 13:30:34.194220 6 controller.go:384] Service "mysite/orderer1-example-com" does not have any active Endpoint for TCP port 7050
W0406 13:30:34.194254 6 controller.go:384] Service "mysite/orderer3-example-com" does not have any active Endpoint for TCP port 7050
I0406 13:30:34.194291 6 controller.go:146] "Configuration changes detected, backend reload required"
I0406 13:30:34.251939 6 controller.go:163] "Backend successfully reloaded"
I0406 13:30:34.252686 6 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-6657875575-5n6wp", UID:"a4065ec4-4b9d-43fa-a8f0-9cf7aebe18ea", APIVersion:"v1", ResourceVersion:"5726", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
W0406 13:30:58.303357 6 controller.go:384] Service "mysite/peer-org0-example-com" does not have any active Endpoint for TCP port 7051
W0406 13:30:58.303415 6 controller.go:384] Service "mysite/orderer3-example-com" does not have any active Endpoint for TCP port 7050
I0406 13:30:58.303459 6 controller.go:146] "Configuration changes detected, backend reload required"
I0406 13:30:58.381489 6 controller.go:163] "Backend successfully reloaded"
I0406 13:30:58.381891 6 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-6657875575-5n6wp", UID:"a4065ec4-4b9d-43fa-a8f0-9cf7aebe18ea", APIVersion:"v1", ResourceVersion:"5726", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
[10.233.81.6] [06/Apr/2021:13:30:58 +0000] TCP 200 0 33 0.001
W0406 13:31:01.636849 6 controller.go:384] Service "mysite/peer-org0-example-com" does not have any active Endpoint for TCP port 7051
I0406 13:31:01.636919 6 controller.go:146] "Configuration changes detected, backend reload required"
I0406 13:31:01.695670 6 controller.go:163] "Backend successfully reloaded"
I0406 13:31:01.695918 6 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-6657875575-5n6wp", UID:"a4065ec4-4b9d-43fa-a8f0-9cf7aebe18ea", APIVersion:"v1", ResourceVersion:"5726", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload trig
# logs in orderer container
2021-04-06 13:30:57.770 UTC [grpc] Infof -> DEBU 496 Subchannel picks a new address "192.168.1.240:7003" to connect
2021-04-06 13:30:57.770 UTC [grpc] UpdateSubConnState -> DEBU 4a6 pickfirstBalancer: HandleSubConnStateChange: 0xc0001dca90, {CONNECTING <nil>}
2021-04-06 13:30:57.771 UTC [grpc] Infof -> DEBU 4aa Channel Connectivity change to CONNECTING
2021-04-06 13:30:57.770 UTC [grpc] Infof -> DEBU 4a7 Subchannel picks a new address "192.168.1.240:7004" to connect
2021-04-06 13:30:57.771 UTC [orderer.consensus.etcdraft] apply -> INFO 4ab Applied config change to add node 1, current nodes in channel: [1 2 3] channel=sy
stemchannel node=1
2021-04-06 13:30:57.771 UTC [orderer.consensus.etcdraft] apply -> INFO 4ac Applied config change to add node 2, current nodes in channel: [1 2 3] channel=sy
stemchannel node=1
2021-04-06 13:30:57.771 UTC [orderer.consensus.etcdraft] apply -> INFO 4ad Applied config change to add node 3, current nodes in channel: [1 2 3] channel=sy
stemchannel node=1
2021-04-06 13:30:58.773 UTC [grpc] Warningf -> DEBU 4af grpc: addrConn.createTransport failed to connect to {192.168.1.240:7003 <nil> 0 <nil>}. Err: connec
tion error: desc = "transport: Error while dialing dial tcp 192.168.1.240:7003: connect: connection refused". Reconnecting...
2021-04-06 13:30:58.773 UTC [grpc] Infof -> DEBU 4b0 Subchannel Connectivity change to TRANSIENT_FAILURE
2021-04-06 13:30:58.773 UTC [grpc] Warningf -> DEBU 4ae grpc: addrConn.createTransport failed to connect to {192.168.1.240:7004 <nil> 0 <nil>}. Err: connec
tion error: desc = "transport: Error while dialing dial tcp 192.168.1.240:7004: connect: connection refused". Reconnecting...
2021-04-06 13:30:58.773 UTC [grpc] Infof -> DEBU 4b1 Subchannel Connectivity change to TRANSIENT_FAILURE
2021-04-06 13:30:58.773 UTC [grpc] UpdateSubConnState -> DEBU 4b2 pickfirstBalancer: HandleSubConnStateChange: 0xc0001dca90, {TRANSIENT_FAILURE connection e
rror: desc = "transport: Error while dialing dial tcp 192.168.1.240:7004: connect: connection refused"}
2021-04-06 13:30:58.773 UTC [grpc] Infof -> DEBU 4b3 Channel Connectivity change to TRANSIENT_FAILURE
# logs in manual applying allservices.yaml, which generated by minifabric.
kubectl apply -f minifabric/vars/run/allservices.yaml
configmap/tcp-services created
error: error validating "minifabric/vars/run/allservices.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false
kubectl apply -f minifabric/vars/run/allservices.yaml --validate=false
configmap/tcp-services unchanged
service/ingress-nginx-controller created
The Deployment "ingress-nginx-controller" is invalid:
* spec.selector: Required value
* spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels`
* spec.template.spec.containers[0].image: Required value
@litong01 I tested with single orderer (solo), then situation is changed, I got another progress but still failed.
orderer could start as solo leader, and ready in systemchannel, but it failed in channel creation since cli container couldn't communicate with others. It may be packet blocked by ipvs or iptables rules which added by ingress-controller (not sure).
By the way, why cli container is running as pure docker container but not k8s pod/service?
cli is a client which in theory can run anywhere to communicate with fabric network. You probably did not have ingress setup correctly so that your cli can not communicate with the network.
@litong01 thank you for your reply.
I tested following three k8s systems but still failed setting up fabric on k8s with minifabric. Please let me know your test-ok case, how did you setup k8s environment such as:
my tests: a) https://github.com/kubernetes-sigs/kubespray as describing aboves, I had some trouble around LoadBalancer, when I tested with calico + metalLB + ingress controller.
b) Amazon EKS by setting up with official eksctl minifabric playbooks may be not compatible with EKS. it failed in first kube operation (cannot create namescape etc), it may be caused that EKS requires amazon original kubectl.
c) https://github.com/kubernetes-sigs/kind/ I tested with kind + metalLB + ingress controller. It passed tests described in https://kind.sigs.k8s.io/docs/user/loadbalancer/ and https://kind.sigs.k8s.io/docs/user/ingress/ but minifabric failed setting up fabric on k8s, in the raft leader election and others, maybe caused by ingress-controller issue.
this has been tested on docker desktop kubernetes and GKE. Have not used kind to test this, nor EKS. Before you try this against any other k8s env, it is important to make sure that you use vendor client apps such as gcloud, ibmcloud, or aws client to setup kubeconfig files and credentials, then you can verify with command kubectl get nodes
, if you can not do that, that means your env is not ready to do minifabric deployment onto the target k8s system. You have to make sure that you can do that and the command returns expected results. something like this.
ubuntu@u2004:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-tongedge-default-pool-ed6ee5f7-glz2 Ready <none> 112m v1.19.9-gke.100
gke-tongedge-default-pool-ed6ee5f7-xknl Ready <none> 112m v1.19.9-gke.100
gke-tongedge-default-pool-ed6ee5f7-zsj5 Ready <none> 112m v1.19.9-gke.100
Also, it requires persistent volume and ingress controller should provide you an external accessible IP address which must be used in the spec.yaml file and kubeconfig and certificate files must be in the right place.
@litong01 thank you for your reply.
I'm re-playing your test-ok cases. I'd like to confirm URL for nginx ingress-controller deployment which used by your cases. Did you used the same URL in both cases, as described in DeployOntoK8S.md ? if not, please let me know which URL you used...
when I tested with "docker desktop for windows" with built-in kubernetes and built-in loadbalancer, minifab failed setting up fabric on k8s because built-in loadbalancer assigned 'localhost' as extanal-ip for ingress controller.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44.0/deploy/static/provider/cloud/deploy.yaml
kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx ingress-nginx-controller LoadBalancer 10.96.21.110 localhost 80:30019/TCP,443:32635/TCP 41m
ingress-nginx ingress-nginx-controller-admission ClusterIP 10.104.189.171 <none> 443/TCP 41m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 78m
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 78m
kubectl get node -A -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
docker-desktop Ready master 140m v1.19.7 192.168.65.4 <none> Docker Desktop 5.4.72-microsoft-standard-WSL2 docker://20.10.5
docker -v
Docker version 20.10.5, build 55c4c88
you should be able to use the latest ingress controller deployment
It failed by the same reason, even latest ingress controller deployment , as bellow:
# test with docker desktop built-in kubernetes on windows
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.45.0/deploy/static/provider/cloud/deploy.yaml
kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx ingress-nginx-controller LoadBalancer 10.98.9.219 localhost 80:31128/TCP,443:32616/TCP 5s
ingress-nginx ingress-nginx-controller-admission ClusterIP 10.109.248.99 <none> 443/TCP 5s
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 16h
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16h
I got progress for setting up HLF on Google managed k8s(GKE) with PEER_DATABASE_TYPE=golevel
test ok: solo orderer, with two peer organizations failed: multiple orderers case, error message said 'no Raft leader', but reason unknown... failed: PEER_DATABASE_TYPE=couchdb even solo orderer (no couchdb instance created, maybe running with golevel)
It requires few modifications for minifabric container, for my test-ok case:
FROM line in Dockerfile: pure alpine => google/gcloud-sdk:alpine docker run parameter in minifab: passing CLOUDSDK_CONFIG and credentials, as described in https://hub.docker.com/r/google/cloud-sdk
I also tested minifabric with azure AKS, and result was the same. I could setup solo orderer with multiple peer organization with PEER_DATABASE_TYPE=golevel, without any modification for minifabric container, but failed the other cases.
@itaru2622 should this be still open? seems that you have made a lot of progress here and also with your PR, seems this needs to be closed.
@litong01 Yes, I will close this issue for now, since lots of issues described the above were fixed.
I still can not setup HLF on baremetal(on-premises) k8s. it may be caused by k8s loadbalancer issue but not minifabric issue, since minifabric with cloud k8s works fine.
我正在尝试使用 kubespray 作为本地 k8s 部署工具的 docs/DeployOntoK8S.md。
在 DeployOntoK8S.md 中说:
部署并运行后,您应该获得一个公共 IP 地址,这是配置 Minifabric spec.yaml 文件所需的。
我在第 4 步(准备 Nginx 入口控制器)中取得了成功,并获得了许多 IP 地址,如下所示。 我应该为 spec.yaml 中的 endpoint_address 使用哪个地址? 请让我知道如何为 spec.yaml 选择/获取地址。
$ kubectl wait --namespace ingress-nginx --for=condition=ready pod --selector=app.kubernetes.io/component=controller --timeout=120s pod/ingress-nginx-controller-7fc74cf778-ts2gk 条件满足 $ kubectl get pod --namespace ingress-nginx -o wide NAME READY STATUS RESTARTS AGE IP 节点 指定节点 READINESS GATES ingress-nginx-admission-create-fg456 0/1 Completed 0 64m 10.233.86.2 k8s-n3 < none > < none > ingress-nginx-admission-patch-9wks4 0/1 Completed 2 64m 10.233.81.1 k8s-n2 < none > <无> ingress-nginx-controller-7fc74cf778-ts2gk 1/1 运行 0 64m 10.233.81.2 k8s-n2 <无> <无> $ kubectl get service --namespace ingress-nginx -o wide 名称 类型 CLUSTER-IP EXTERNAL-IP PORT(S) Age SELECTOR ingress-nginx-controller LoadBalancer 10.233.38.205 < pending > 80:31933/TCP,443:32589/TCP 63m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes。 io/name=ingress-nginx ingress-nginx-controller-admission ClusterIP 10.233.38.77 < none > 443/TCP 63m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress- nginx $kubectl get node -o wide 姓名 状态 角色 年龄 版本 内部 IP 外部 IP k8s-n1 Ready control-plane,master 96m v1.20.5 192.168.1.206 < none > k8s-n2 Ready < none > 95m v1.20.5 192.168.1.207 < none > k8s-n3 Ready < none > 95m v1.20.20.2 <无>
Hi, what's the correct ip address? I have the same question
Update i am trying in personal latptop so i used metallb to get external endpoint ip. Now after setting the external end point ip in the spec.yaml and running it im getting this error while minifabric up -e true
Hello im kinda struck over here my ingress is not getting ip assigned its in the pending state.
im using kubernetes version 1.18.0, and ingress version 0.44 with weave network
@raza-sikander I think something wrong in your metallb. alternative load balancer "purelb" is much easier to use. https://purelb.gitlab.io/docs/install/
please make sure you have dynamic volume provisioner in your kubernetes.
@raza-sikander I think something wrong in your metallb. alternative load balancer "purelb" is much easier to use. https://purelb.gitlab.io/docs/install/
please make sure you have dynamic volume provisioner in your kubernetes.
Thank you i had the load balancer issue. sorted it and it working.
I'm trying docs/DeployOntoK8S.md with kubespray as on-premises k8s deployment tool.
in DeployOntoK8S.md says:
I got success in step4 (Prepare Nginx ingress controller) and got many IP addresses as bellow. which address should I use for endpoint_address in spec.yaml? please let me know how to choose / get address for spec.yaml.