smart-edge-open / converged-edge-experience-kits

Source code for experience kits with Ansible-based deployment.
Apache License 2.0
37 stars 40 forks source link

Error while deploying openness release 20.12 for edge node. #77

Open pushpraj527 opened 3 years ago

pushpraj527 commented 3 years ago

Hi, We tried deploying openness-experience-kit(v20.12). We were able to deploy controller successfully, but while deploying node faced the below mention issue. I am also attaching deployment log here( 2020-12-21_14-31-23_ansible.log ). Please let me if any thing else is required.

2020-12-21 14:53:50,037 p=17432 u=root n=ansible | TASK [openness/node : wait for Kafka CA and User secrets] ** 2020-12-21 14:53:50,037 p=17432 u=root n=ansible | task path: /home/centos/aman/openness-experience-kits/roles/openness/node/tasks/prebuild/kafka_certs.yml:23 2020-12-21 14:53:50,892 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (60 retries left). 2020-12-21 14:54:52,117 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (59 retries left). 2020-12-21 14:55:53,300 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (58 retries left). 2020-12-21 14:56:54,469 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (57 retries left). 2020-12-21 14:57:58,617 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (56 retries left). 2020-12-21 14:59:00,383 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (55 retries left). 2020-12-21 15:00:01,794 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (54 retries left). 2020-12-21 15:01:03,138 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (53 retries left). 2020-12-21 15:02:04,407 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (52 retries left). 2020-12-21 15:03:05,760 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (51 retries left). 2020-12-21 15:04:07,640 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (50 retries left). 2020-12-21 15:05:09,128 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (49 retries left). 2020-12-21 15:06:10,436 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (48 retries left). 2020-12-21 15:07:11,940 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (47 retries left). 2020-12-21 15:08:15,510 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (46 retries left). 2020-12-21 15:09:16,786 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (45 retries left). 2020-12-21 15:10:18,205 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (44 retries left). 2020-12-21 15:11:19,646 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (43 retries left). 2020-12-21 15:12:20,947 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (42 retries left). 2020-12-21 15:13:22,296 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (41 retries left). 2020-12-21 15:14:23,920 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (40 retries left). 2020-12-21 15:15:25,327 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (39 retries left). 2020-12-21 15:16:26,751 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (38 retries left). 2020-12-21 15:17:28,155 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (37 retries left). 2020-12-21 15:18:32,401 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (36 retries left). 2020-12-21 15:19:33,872 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (35 retries left). 2020-12-21 15:20:36,335 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (34 retries left). 2020-12-21 15:21:37,702 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (33 retries left). 2020-12-21 15:22:39,036 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (32 retries left). 2020-12-21 15:23:40,375 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (31 retries left). 2020-12-21 15:24:41,715 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (30 retries left). 2020-12-21 15:25:43,159 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (29 retries left). 2020-12-21 15:26:45,648 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (28 retries left). 2020-12-21 15:27:47,039 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (27 retries left). 2020-12-21 15:28:50,458 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (26 retries left). 2020-12-21 15:29:52,203 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (25 retries left). 2020-12-21 15:30:53,609 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (24 retries left). 2020-12-21 15:31:54,969 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (23 retries left). 2020-12-21 15:32:56,339 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (22 retries left). 2020-12-21 15:33:57,995 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (21 retries left). 2020-12-21 15:34:59,529 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (20 retries left). 2020-12-21 15:36:00,889 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (19 retries left). 2020-12-21 15:37:02,278 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (18 retries left). 2020-12-21 15:38:03,674 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (17 retries left). 2020-12-21 15:39:10,494 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (16 retries left). 2020-12-21 15:40:11,852 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (15 retries left). 2020-12-21 15:41:14,020 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (14 retries left). 2020-12-21 15:42:15,470 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (13 retries left). 2020-12-21 15:43:16,845 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (12 retries left). 2020-12-21 15:44:18,169 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (11 retries left). 2020-12-21 15:45:19,598 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (10 retries left). 2020-12-21 15:46:20,973 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (9 retries left). 2020-12-21 15:47:22,383 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (8 retries left). 2020-12-21 15:48:23,708 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (7 retries left). 2020-12-21 15:49:28,033 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (6 retries left). 2020-12-21 15:50:29,519 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (5 retries left). 2020-12-21 15:51:31,098 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (4 retries left). 2020-12-21 15:52:32,647 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (3 retries left). 2020-12-21 15:53:34,163 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (2 retries left). 2020-12-21 15:54:35,627 p=17432 u=root n=ansible | FAILED - RETRYING: wait for Kafka CA and User secrets (1 retries left). 2020-12-21 15:55:37,504 p=17432 u=root n=ansible | fatal: [node01 -> 192.168.0.16]: FAILED! => { "attempts": 60, "changed": false, "cmd": "restartCounts=kubectl get pods -n kafka -o json | jq -r '.items[] | [.status.containerStatuses[].restartCount] | @sh'\nfor restartCount in $restartCounts; do\n if [ $((restartCount + 0)) -gt 10 ]; then\n exit -1\n fi\ndone\nkubectl get secret cluster-cluster-ca-cert -n kafka && kubectl get secret eaa-kafka -n kafka\n", "delta": "0:00:00.713457", "end": "2020-12-21 15:55:37.251058", "rc": 1, "start": "2020-12-21 15:55:36.537601" }

STDOUT:

NAME TYPE DATA AGE cluster-cluster-ca-cert Opaque 3 56m

STDERR:

jq: error (at :1756): Cannot iterate over null (null) Error from server (NotFound): secrets "eaa-kafka" not found

MSG:

non-zero return code

amr-mokhtar commented 3 years ago

Hi @pushpraj527! This might be happening due to slow download speed. Can you give it another try and see if the issue still persists? If persisting, please include the status of all the pods.

pushpraj527 commented 3 years ago

Hi @amr-mokhtar I tried two times and always getting this issue. Pod status and secret status are mentioned below.

podStatus.txt

NAMESPACE NAME READY STATUS RESTARTS AGE cdi cdi-apiserver-85ff78c47c-mkwfx 1/1 Running 0 15h cdi cdi-deployment-5c947c965f-pqb26 1/1 Running 0 15h cdi cdi-operator-7466c8c6b-vgkhw 1/1 Running 2 18h cdi cdi-uploadproxy-6887998f8d-kll8g 1/1 Running 0 15h harbor harbor-app-harbor-chartmuseum-6cbd5c5bbb-vqxsr 1/1 Running 0 18h harbor harbor-app-harbor-clair-779df4555b-jx4tz 2/2 Running 0 18h harbor harbor-app-harbor-core-7cd94df459-p6xk2 1/1 Running 0 18h harbor harbor-app-harbor-database-0 1/1 Running 0 18h harbor harbor-app-harbor-jobservice-864f675bfc-zcqdx 1/1 Running 0 18h harbor harbor-app-harbor-nginx-7dcd9fbc86-8frtx 1/1 Running 0 18h harbor harbor-app-harbor-notary-server-7945945b9d-tq65h 1/1 Running 0 18h harbor harbor-app-harbor-notary-signer-7556c8b697-zkm8l 1/1 Running 0 18h harbor harbor-app-harbor-portal-fd5ff4bc9-8zzf7 1/1 Running 0 18h harbor harbor-app-harbor-redis-0 1/1 Running 0 18h harbor harbor-app-harbor-registry-6c66d95768-pt26k 2/2 Running 0 18h harbor harbor-app-harbor-trivy-0 1/1 Running 0 18h kafka cluster-kafka-0 0/2 ContainerCreating 0 6m44s kafka cluster-zookeeper-0 1/1 Running 0 15h kafka strimzi-cluster-operator-68b6d59f74-229tw 0/1 Running 19 18h kube-system coredns-f9fd979d6-c6mrm 1/1 Running 0 18h kube-system coredns-f9fd979d6-x2dc7 1/1 Running 0 18h kube-system descheduler-cronjob-1608563400-zccqz 0/1 Completed 0 14h kube-system descheduler-cronjob-1608563520-5pm6p 0/1 RunContainerError 0 14h kube-system descheduler-cronjob-1608563520-8qj94 0/1 RunContainerError 0 14h kube-system descheduler-cronjob-1608563520-crlv2 0/1 Completed 0 14h kube-system descheduler-cronjob-1608568320-j84f8 0/1 Completed 0 13h kube-system descheduler-cronjob-1608576720-nmbwr 0/1 RunContainerError 0 11h kube-system descheduler-cronjob-1608576720-nqzrt 0/1 ContainerCreating 0 7h9m kube-system etcd-edgecontroller2 1/1 Running 0 18h kube-system kube-apiserver-edgecontroller2 1/1 Running 0 18h kube-system kube-controller-manager-edgecontroller2 1/1 Running 2 18h kube-system kube-ovn-cni-fq65t 1/1 Running 0 18h kube-system kube-ovn-cni-rt85w 1/1 Running 2 15h kube-system kube-ovn-controller-76d6bd7c8d-tzgk6 1/1 Running 0 18h kube-system kube-ovn-pinger-6f897 1/1 Running 0 18h kube-system kube-ovn-pinger-gdcqc 1/1 Running 0 15h kube-system kube-proxy-8gz8g 1/1 Running 0 18h kube-system kube-proxy-b2l79 1/1 Running 0 15h kube-system kube-scheduler-edgecontroller2 1/1 Running 0 18h kube-system ovn-central-5845ddffb5-rcnv7 1/1 Running 0 18h kube-system ovs-ovn-7v8xg 1/1 Running 1 18h kube-system ovs-ovn-zkp7j 1/1 Running 3 15h kubevirt virt-api-666f7455c8-4nvln 1/1 Running 0 15h kubevirt virt-api-666f7455c8-9vc24 1/1 Running 0 15h kubevirt virt-controller-9c85d9794-n2x9l 1/1 Running 19 15h kubevirt virt-controller-9c85d9794-qb45t 1/1 Running 18 15h kubevirt virt-handler-hpx4z 1/1 Running 0 15h kubevirt virt-operator-6699ff65f4-45zmc 1/1 Running 13 18h kubevirt virt-operator-6699ff65f4-m6fnb 1/1 Running 11 18h openness certsigner-6cb79468b5-t5qpm 0/1 ErrImageNeverPull 0 18h openness eaa-69c7bb7b5d-h2mtf 0/1 Init:0/2 0 18h openness edgedns-mrhz7 0/1 Init:ErrImageNeverPull 0 15h openness interfaceservice-f5vhg 0/1 Init:ErrImageNeverPull 0 15h openness nfd-release-node-feature-discovery-master-6d978d7668-4flfg 1/1 Running 0 18h openness nfd-release-node-feature-discovery-worker-djn4q 1/1 Running 3 15h telemetry cadvisor-lhhln 2/2 Running 0 15h telemetry collectd-sxv46 2/2 Running 0 15h telemetry custom-metrics-apiserver-55bdf684ff-xhchz 1/1 Running 0 18h telemetry grafana-76867c586-h7hlm 2/2 Running 0 17h telemetry otel-collector-f9b9d494-d5s8q 2/2 Running 25 18h telemetry prometheus-node-exporter-4sg9m 1/1 Running 0 15h telemetry prometheus-server-8656f6bf98-p84f9 3/3 Running 0 18h telemetry telemetry-aware-scheduling-554db589c4-2xqdv 2/2 Running 0 18h telemetry telemetry-collector-certs-gzggr 0/1 Completed 0 18h telemetry telemetry-node-certs-8frdk 1/1 Running 0 15h

secrets.txt NAMESPACE NAME TYPE DATA AGE cdi cdi-api-signing-key Opaque 2 15h cdi cdi-apiserver-server-cert Opaque 2 15h cdi cdi-apiserver-signer Opaque 2 15h cdi cdi-apiserver-token-lj95d kubernetes.io/service-account-token 3 15h cdi cdi-operator-token-n8t27 kubernetes.io/service-account-token 3 18h cdi cdi-sa-token-rnrrm kubernetes.io/service-account-token 3 15h cdi cdi-uploadproxy-server-cert Opaque 2 15h cdi cdi-uploadproxy-signer Opaque 2 15h cdi cdi-uploadproxy-token-txlwj kubernetes.io/service-account-token 3 15h cdi cdi-uploadserver-client-cert Opaque 2 15h cdi cdi-uploadserver-client-signer Opaque 2 15h cdi cdi-uploadserver-signer Opaque 2 15h cdi default-token-jfzz9 kubernetes.io/service-account-token 3 18h default ca-certrequester Opaque 1 18h default default-token-mwslf kubernetes.io/service-account-token 3 18h default sh.helm.release.v1.prometheus-adapter.v1 helm.sh/release.v1 1 18h harbor default-token-pwlj6 kubernetes.io/service-account-token 3 18h harbor harbor-app-harbor-chartmuseum Opaque 1 18h harbor harbor-app-harbor-clair Opaque 3 18h harbor harbor-app-harbor-core Opaque 8 18h harbor harbor-app-harbor-database Opaque 1 18h harbor harbor-app-harbor-jobservice Opaque 2 18h harbor harbor-app-harbor-nginx Opaque 3 18h harbor harbor-app-harbor-notary-server Opaque 5 18h harbor harbor-app-harbor-registry Opaque 3 18h harbor harbor-app-harbor-trivy Opaque 2 18h harbor sh.helm.release.v1.harbor-app.v1 helm.sh/release.v1 1 18h kafka cluster-clients-ca Opaque 1 15h kafka cluster-clients-ca-cert Opaque 3 15h kafka cluster-cluster-ca Opaque 1 15h kafka cluster-cluster-ca-cert Opaque 3 15h kafka cluster-cluster-operator-certs Opaque 4 15h kafka cluster-kafka-brokers Opaque 4 14h kafka cluster-kafka-token-8g8zq kubernetes.io/service-account-token 3 14h kafka cluster-zookeeper-nodes Opaque 4 15h kafka cluster-zookeeper-token-rv9lv kubernetes.io/service-account-token 3 15h kafka default-token-8pg7g kubernetes.io/service-account-token 3 18h kafka sh.helm.release.v1.strimzi.v1 helm.sh/release.v1 1 18h kafka strimzi-cluster-operator-token-2j82w kubernetes.io/service-account-token 3 18h kube-node-lease default-token-c47zh kubernetes.io/service-account-token 3 18h kube-public default-token-k44np kubernetes.io/service-account-token 3 18h kube-system attachdetach-controller-token-rrc8z kubernetes.io/service-account-token 3 18h kube-system bootstrap-signer-token-jnvmq kubernetes.io/service-account-token 3 18h kube-system bootstrap-token-bcg7hx bootstrap.kubernetes.io/token 6 18h kube-system certificate-controller-token-65h2t kubernetes.io/service-account-token 3 18h kube-system clusterrole-aggregation-controller-token-rcmfg kubernetes.io/service-account-token 3 18h kube-system coredns-token-4dkp7 kubernetes.io/service-account-token 3 18h kube-system cronjob-controller-token-tn82f kubernetes.io/service-account-token 3 18h kube-system daemon-set-controller-token-wd6jw kubernetes.io/service-account-token 3 18h kube-system default-token-f79hc kubernetes.io/service-account-token 3 18h kube-system deployment-controller-token-brvf6 kubernetes.io/service-account-token 3 18h kube-system descheduler-sa-token-glrdj kubernetes.io/service-account-token 3 18h kube-system disruption-controller-token-n59hd kubernetes.io/service-account-token 3 18h kube-system endpoint-controller-token-tdkfs kubernetes.io/service-account-token 3 18h kube-system endpointslice-controller-token-5kpjj kubernetes.io/service-account-token 3 18h kube-system endpointslicemirroring-controller-token-wrp9c kubernetes.io/service-account-token 3 18h kube-system expand-controller-token-wq44z kubernetes.io/service-account-token 3 18h kube-system generic-garbage-collector-token-gxt5p kubernetes.io/service-account-token 3 18h kube-system horizontal-pod-autoscaler-token-xqpx9 kubernetes.io/service-account-token 3 18h kube-system job-controller-token-qh9p2 kubernetes.io/service-account-token 3 18h kube-system kube-proxy-token-jm4dv kubernetes.io/service-account-token 3 18h kube-system namespace-controller-token-r6bxp kubernetes.io/service-account-token 3 18h kube-system node-controller-token-b5fwt kubernetes.io/service-account-token 3 18h kube-system ovn-token-f9drj kubernetes.io/service-account-token 3 18h kube-system persistent-volume-binder-token-8ztw4 kubernetes.io/service-account-token 3 18h kube-system pod-garbage-collector-token-bph5c kubernetes.io/service-account-token 3 18h kube-system pv-protection-controller-token-qv755 kubernetes.io/service-account-token 3 18h kube-system pvc-protection-controller-token-6wm4j kubernetes.io/service-account-token 3 18h kube-system replicaset-controller-token-xkf89 kubernetes.io/service-account-token 3 18h kube-system replication-controller-token-8vwkz kubernetes.io/service-account-token 3 18h kube-system resourcequota-controller-token-989jw kubernetes.io/service-account-token 3 18h kube-system service-account-controller-token-8cbgj kubernetes.io/service-account-token 3 18h kube-system service-controller-token-xj9bt kubernetes.io/service-account-token 3 18h kube-system statefulset-controller-token-v9q8r kubernetes.io/service-account-token 3 18h kube-system token-cleaner-token-bgbq4 kubernetes.io/service-account-token 3 18h kube-system ttl-controller-token-wz6bw kubernetes.io/service-account-token 3 18h kubevirt default-token-zwcml kubernetes.io/service-account-token 3 18h kubevirt kubevirt-apiserver-token-k5rzv kubernetes.io/service-account-token 3 15h kubevirt kubevirt-controller-token-9qvc4 kubernetes.io/service-account-token 3 15h kubevirt kubevirt-handler-token-42khh kubernetes.io/service-account-token 3 15h kubevirt kubevirt-operator-certs Opaque 3 15h kubevirt kubevirt-operator-token-sfrb7 kubernetes.io/service-account-token 3 18h kubevirt kubevirt-virt-api-certs Opaque 3 15h kubevirt kubevirt-virt-handler-certs Opaque 3 15h openness ca-certrequester Opaque 1 18h openness certgen Opaque 2 18h openness csr-signer-token-kt9z4 kubernetes.io/service-account-token 3 18h openness default-token-wr2pv kubernetes.io/service-account-token 3 18h openness eaa-token-h7fq2 kubernetes.io/service-account-token 3 18h openness edgedns-token-kqpqh kubernetes.io/service-account-token 3 18h openness interfaceservice-token-4c4rl kubernetes.io/service-account-token 3 18h openness nfd-master-token-c2t2z kubernetes.io/service-account-token 3 18h openness nfd-release-node-feature-discovery-master-cert Opaque 2 18h openness nfd-release-node-feature-discovery-worker-cert Opaque 2 18h openness root-ca Opaque 2 18h openness sh.helm.release.v1.nfd-release.v1 helm.sh/release.v1 1 18h telemetry certgen Opaque 2 18h telemetry cm-adapter-serving-certs kubernetes.io/tls 2 18h telemetry custom-metrics-apiserver-token-kqh9b kubernetes.io/service-account-token 3 18h telemetry default-token-zzkff kubernetes.io/service-account-token 3 18h telemetry extender-secret kubernetes.io/tls 2 18h telemetry grafana Opaque 3 18h telemetry grafana-test-token-l7977 kubernetes.io/service-account-token 3 18h telemetry grafana-token-h6zrl kubernetes.io/service-account-token 3 18h telemetry prometheus-node-exporter-token-zgpqx kubernetes.io/service-account-token 3 18h telemetry prometheus-server-token-bnk6v kubernetes.io/service-account-token 3 18h telemetry root-ca Opaque 2 18h telemetry sh.helm.release.v1.cadvisor.v1 helm.sh/release.v1 1 18h telemetry sh.helm.release.v1.collectd.v1 helm.sh/release.v1 1 18h telemetry sh.helm.release.v1.grafana.v1 helm.sh/release.v1 1 18h telemetry sh.helm.release.v1.otel-collector.v1 helm.sh/release.v1 1 18h telemetry sh.helm.release.v1.prometheus.v1 helm.sh/release.v1 1 18h telemetry telemetry-aware-scheduling-service-account-token-v26m7 kubernetes.io/service-account-token 3 18h

tomaszwesolowski commented 3 years ago

Hi @pushpraj527 It looks like you might have some issues with docker daemon. Can you check versions of docker and double check if everything works correctly? Also always make sure that machines you work on are clean and were not used before

pushpraj527 commented 3 years ago

Hi @tomaszwesolowski I tried it on the fresh machine only. Nothing manually done there.

tomaszwesolowski commented 3 years ago

Hi, could you share with us file Openness_experience_kit_archive.tar.gz that was created after deployment?

Jorge-Sasiain commented 3 years ago

Hello @tomaszwesolowski

I have a the same problem as the issue starter when deploying the edge node (currently in the same step at 34 retries left and counting).

Please, find my Openness_experience_kit_archive.tar.gz here: 2021_01_13_11_44_59_Openness_experience_kit_archive.tar.gz

In case it's relevant, here's the "describe pod" output of the only pod I found in the kafka namespace: https://pastebin.com/raw/gPZsTYp4 It says that 0/2 nodes are available because of taints the pod didn't tolerate.

My knowledge about Kubernetes is limited but according to the output below my edge node seems to be ready and has no taints:

[root@openness-controller centos]# kubectl describe node openness-edgenode
(...)
Events:
  Type    Reason                   Age   From        Message
  ----    ------                   ----  ----        -------
  Normal  Starting                 31m   kubelet     Starting kubelet.
  Normal  NodeHasSufficientMemory  31m   kubelet     Node openness-edgenode status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    31m   kubelet     Node openness-edgenode status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     31m   kubelet     Node openness-edgenode status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  31m   kubelet     Updated Node Allocatable limit across pods
  Normal  Starting                 31m   kube-proxy  Starting kube-proxy.
  Normal  NodeReady                29m   kubelet     Node openness-edgenode status is now: NodeReady
[root@openness-controller centos]# kubectl describe node openness-edgenode | grep Taints
Taints:             <none>

In the edge node, the docker logs of the only container matching "docker container ls | grep kafka" are empty.

I would appreciate any pointers towards finding a solution for this issue. Thanks in advance.

Edit: It seems to be an underlaying networking issue from my part unrelated to taints and tolerations where nodes just can't reach pods from a different node. I'll edit again confirming it if I can fix it.

iamtiresome commented 3 years ago

My solution is run “kubectl taint nodes - -all node-role.kubernetes.io/master-“ and then run deploy.py file again

NishankK commented 3 years ago

My solution is run “kubectl taint nodes - -all node-role.kubernetes.io/master-“ and then run deploy.py file again

I am facing exact same issue...Can you please elaborate on what you did?

  1. Where did you run it?
  2. What did you do after running (since it is "click to deploy" kind of a thing, right. So how do we: a. Run deploy. py again and where is it located. b. What are the steps after that.

Details would be very much appreciated.

Thanks, Nishank

Jorge-Sasiain commented 3 years ago

@NishankK In my case the error was just caused because the pods in my controller node had no connectivity with pods in the edge node. I would check whether that's your case too. (I don't remember exactly right now, but in my case I had the controller in a OpenStack VM and it was some issue with port security rules I think).

If I'm understanding correctly, removing the master taint from all nodes as suggested above would enable the EAA pod (or whatever it is) to be deployed on the controller instead of the edge, which would place it in the same node as the Kafka pod but I don't think it's correct (but I could be missing something, sorry i nadvance if that's the case).

NishankK commented 3 years ago

@NishankK In my case the error was just caused because the pods in my controller node had no connectivity with pods in the edge node. I would check whether that's your case too. (I don't remember exactly right now, but in my case I had the controller in a OpenStack VM and it was some issue with port security rules I think).

If I'm understanding correctly, removing the master taint from all nodes as suggested above would enable the EAA pod (or whatever it is) to be deployed on the controller instead of the edge, which would place it in the same node as the Kafka pod but I don't think it's correct (but I could be missing something, sorry i nadvance if that's the case).

Thanks for replying Jorge..Actually I am getting this exact error while deploying Openness on Azure..I got confused. In Azure deployment case, with the click of a button the Edge as well as controller node is supposed to be deployed on Azure.

Okay I have 1 other query, if you can help, please. Is it possible to deploy Openness controller and Edge on VM rather than server? If yes, what is the recommended VM configuration like how may GBs, CPU etc..Can't find a clear answer to this anywhere.

Many thanks

Jorge-Sasiain commented 3 years ago

I'm no expert by any means, so I'll just link you to this thread in OpenNESS-dev (Question number 2 covers that): https://mail.openness.org/archives/developer/2021-January/000225.html