smart-edge-open / converged-edge-experience-kits

Source code for experience kits with Ansible-based deployment.
Apache License 2.0
37 stars 40 forks source link

Error in running Ansible playbook for the edge node #33

Closed pavanats closed 4 years ago

pavanats commented 4 years ago

Hi, I could install the controller on a VM, however, edge node installation on physical server has failed with the following error: TASK [kubernetes/cni/kubeovn/worker : try to get ovs-ovn execution logs] *** task path: /root/openness-experience-kits/roles/kubernetes/cni/kubeovn/worker/tasks/main.yml:75 fatal: [node01 -> 30.30.30.22]: FAILED! => { "changed": false, "cmd": "set -o pipefail && kubectl logs -n kube-system $(kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name --field-selector spec.nodeName=node01 | grep ovs-ovn)\n", "delta": "0:00:00.444066", "end": "2020-07-11 12:03:37.659635", "rc": 1, "start": "2020-07-11 12:03:37.215569" }

STDERR:

Error from server: Get https://30.30.30.11:10250/containerLogs/kube-system/ovs-ovn-645b9/openvswitch: dial tcp 30.30.30.11:10250: connect: connection refused

If there is a known solution or a workaround, please let us know.

amr-mokhtar commented 4 years ago

From the error, it seems that the controller is unable to connect back to node01. Could be an improper proxy setting. Is this the correct node01 IP 30.30.30.11 ? I tested the above command on my local setup and it is able to retrieve logs. Can you try the same on your shell?

kubectl logs -n kube-system $(kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name --field-selector spec.nodeName=node01 | grep ovs-ovn)
pavanats commented 4 years ago

Hi Amr, I am not using any proxy configuration. Some of the pods are shown as running. Here's the output of the cli cmd you tried: [root@controller ~]# kubectl logs -n kube-system $(kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name --field-selector spec.nodeName=node01 | grep ovs-ovn) Error from server: Get https://30.30.30.11:10250/containerLogs/kube-system/ovs-ovn-645b9/openvswitch: dial tcp 30.30.30.11:10250: connect: connection refused

​Amr, can we have a quick online meeting. We have been stuck for sometime on just the deployment. Pavan


From: Amr Mokhtar notifications@github.com Sent: Monday, July 13, 2020 4:59 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Author author@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Error in running Ansible playbook for the edge node (#33)

From the error, it seems that the controller is unable to connect back to node01. Could be an improper proxy setting. Is this the correct node01 IP 30.30.30.11 ? I tested the above command on my local setup and it is able to retrieve logs. Can you try the same on your shell?

kubectl logs -n kube-system $(kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name --field-selector spec.nodeName=node01 | grep ovs-ovn)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/33#issuecomment-657503141, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC7NYRFUEXC5LXIEDT3R3LV2LANCNFSM4OXL3YNQ.

amr-mokhtar commented 4 years ago

What is the status of the pods in the cluster?

kubectl get pods -A -o wide
pavanats commented 4 years ago

Here's the output. Please note, I haven't been able to deploy the edge node yet without error.

[root@controller ~]# kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cdi cdi-operator-76b6694845-7f4jq 0/1 Terminating 0 2d node01 cdi cdi-operator-76b6694845-q2vqk 0/1 Pending 0 47h cdi cdi-operator-85b8cfcfd9-22m6d 0/1 Terminating 0 2d node01 cdi cdi-operator-85b8cfcfd9-lxqxs 0/1 Pending 0 47h kube-system coredns-66bff467f8-rnjvn 1/1 Running 2 2d1h 10.16.0.3 controller kube-system coredns-66bff467f8-xm4mt 1/1 Running 2 2d1h 10.16.0.2 controller kube-system descheduler-cronjob-1594466400-fvg2j 0/1 Pending 0 47h kube-system descheduler-cronjob-1594466400-n6nqv 0/1 Terminating 0 2d node01 kube-system etcd-controller 1/1 Running 2 2d1h 192.168.122.30 controller kube-system kube-apiserver-controller 1/1 Running 2 2d1h 192.168.122.30 controller kube-system kube-controller-manager-controller 1/1 Running 2 2d1h 192.168.122.30 controller kube-system kube-ovn-cni-9k4st 0/1 Running 1 2d 30.30.30.11 node01 kube-system kube-ovn-cni-n4ltv 1/1 Running 2 2d1h 192.168.122.30 controller kube-system kube-ovn-controller-96f89c68b-8nnp4 0/1 Terminating 0 47h 30.30.30.11 node01 kube-system kube-ovn-controller-96f89c68b-rnm6c 1/1 Running 411 2d1h 192.168.122.30 controller kube-system kube-ovn-controller-96f89c68b-w5msb 0/1 Pending 0 47h kube-system kube-proxy-674j8 1/1 Running 0 2d 30.30.30.11 node01 kube-system kube-proxy-n89gg 1/1 Running 2 2d1h 192.168.122.30 controller kube-system kube-scheduler-controller 1/1 Running 6 45h 192.168.122.30 controller kube-system ovn-central-74986486f9-kdm2c 1/1 Running 2 2d1h 192.168.122.30 controller kube-system ovs-ovn-4v78f 1/1 Running 15 2d1h 192.168.122.30 controller kube-system ovs-ovn-645b9 0/1 CrashLoopBackOff 13 2d 30.30.30.11 node01 kubevirt virt-operator-79c97797-7898j 0/1 Pending 0 47h kubevirt virt-operator-79c97797-h7xrp 0/1 Pending 0 47h kubevirt virt-operator-79c97797-hlsz5 0/1 Terminating 0 2d node01 kubevirt virt-operator-79c97797-z2tlk 0/1 Terminating 0 2d node01 openness docker-registry-deployment-54d5bb5c-d689n 1/1 Running 2 45h 192.168.122.30 controller openness eaa-6f8b94c9d7-5d2v9 0/1 Pending 0 47h openness eaa-6f8b94c9d7-v2md2 0/1 Terminating 0 2d node01 openness edgedns-4fmbd 0/1 Init:0/1 0 47h node01 openness interfaceservice-pt9cd 0/1 Init:0/1 0 47h node01 openness nfd-release-node-feature-discovery-master-76c548c79-7zf2x 1/1 Running 2 2d 10.16.0.16 controller openness nfd-release-node-feature-discovery-worker-mz26p 0/1 ContainerCreating 0 47h 30.30.30.11 node01 openness syslog-master-d2sbk 1/1 Running 2 2d 10.16.0.5 controller openness syslog-ng-fqx4t 0/1 Init:0/1 0 47h node01 telemetry cadvisor-ddwn5 0/2 Init:0/1 0 47h node01 telemetry collectd-wlvx7 0/2 Init:0/1 0 47h 30.30.30.11 node01 telemetry custom-metrics-apiserver-54699b845f-m94vh 1/1 Running 2 2d 10.16.0.13 controller telemetry grafana-6b79c984b-5th88 2/2 Running 4 2d 10.16.0.17 controller telemetry otel-collector-7d5b75bbdf-8lvzm 0/2 Terminating 0 2d node01 telemetry otel-collector-7d5b75bbdf-dvmpc 0/2 Pending 0 47h telemetry prometheus-node-exporter-fvglx 0/1 ContainerCreating 0 47h node01 telemetry prometheus-server-76c96b9497-brlgl 3/3 Running 6 2d 10.16.0.10 controller telemetry telemetry-aware-scheduling-68467c4ccd-nznbq 2/2 Running 4 2d 10.16.0.14 controller telemetry telemetry-collector-certs-9z58c 0/1 Pending 0 47h telemetry telemetry-collector-certs-gc6gm 0/1 Terminating 0 2d node01 telemetry telemetry-node-certs-r2px2 0/1 ContainerCreating 0 47h node01


From: Amr Mokhtar notifications@github.com Sent: Monday, July 13, 2020 5:34 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Author author@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Error in running Ansible playbook for the edge node (#33)

What is the status of the pods in the cluster?

kubectl get pods -A -o wide

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/33#issuecomment-657520367, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC6K3DBAOXLY527D65TR3LZ33ANCNFSM4OXL3YNQ.

amr-mokhtar commented 4 years ago

As mentioned earlier, we currently support OpenNESS in bare-metal only deployment. Given that the using VMs is not running smooth, I would suggest that you start by deploying on bare-metal to get familiar with the system, then start migrating to VM-based setup.

amr-mokhtar commented 4 years ago

@pavanats - Are you ok to close this issue for now and open a new one when you have any issues with the bare-metal installation?

pavanats commented 4 years ago

Hi Amr, I am not trying the setup on 2 baremetal servers. You can close this ticket. I will connect with you again if the baremetal deployment fails. Pavan


From: Amr Mokhtar notifications@github.com Sent: Wednesday, July 15, 2020 6:08 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Mention mention@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Error in running Ansible playbook for the edge node (#33)

@pavanatshttps://github.com/pavanats - Are you ok to close this issue for now and open a new one when you have any issues with the bare-metal installation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/33#issuecomment-658742494, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC5247XQIJ5RDHAWIITR3WPNBANCNFSM4OXL3YNQ.

pavanats commented 4 years ago

Hi Amr, I have tried the edge node deployment on another new server and see the same error again:

fatal: [node01 -> x.x.x.x]: FAILED! => {

"changed": false,
"cmd": "set -o pipefail && kubectl logs -n kube-system $(kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name --field-selector spec.nodeName=node01 | grep ovs-ovn)\n",
"delta": "0:00:00.256773",
"end": "2020-07-15 19:12:14.914509",
"rc": 1,
"start": "2020-07-15 19:12:14.657736"

}

STDERR:

Error from server: Get https://x.x.x.x:10250/containerLogs/kube-system/ovs-ovn-tgfq6/openvswitch: dial tcp x.x.x.x:10250: connect: connection refused

MSG:

non-zero return code ...ignoring

TASK [kubernetes/cni/kubeovn/worker : end the playbook] ***** task path: /root/openness-experience-kits/roles/kubernetes/cni/kubeovn/worker/tasks/main.yml:84 fatal: [node01]: FAILED! => { "changed": false }

MSG:

end the playbook: either ovs-ovn pod did not start or the socket was not created

This error isn't really related to using VM for setup. It can be seen on both hardware and VMs. Output for different pods is shows below:

[root@controller ~]# kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cdi cdi-apiserver-885758cc4-m4dhn 1/1 Terminating 0 3h54m 10.16.0.25 node01 cdi cdi-apiserver-885758cc4-pt9bj 0/1 Pending 0 3h9m cdi cdi-deployment-5bdcc85d54-cq9qw 1/1 Terminating 0 3h54m 10.16.0.27 node01 cdi cdi-deployment-5bdcc85d54-kqsrp 0/1 Pending 0 3h9m cdi cdi-operator-76b6694845-gwkp4 0/1 Pending 0 3h9m cdi cdi-operator-76b6694845-k65tq 1/1 Terminating 0 5h5m 10.16.0.9 node01 cdi cdi-uploadproxy-89cf96777-m4746 0/1 Pending 0 3h9m cdi cdi-uploadproxy-89cf96777-rl2cp 1/1 Terminating 0 3h54m 10.16.0.24 node01 kube-system coredns-66bff467f8-h5796 1/1 Running 0 6h8m 10.16.0.3 controller kube-system coredns-66bff467f8-wzrvs 1/1 Running 0 6h8m 10.16.0.2 controller kube-system descheduler-cronjob-1594823880-hf5f2 0/1 Pending 0 3h9m kube-system descheduler-cronjob-1594823880-prsfv 0/1 Terminating 0 3h14m node01 kube-system etcd-controller 1/1 Running 0 6h8m 134.119.213.95 controller kube-system kube-apiserver-controller 1/1 Running 0 6h8m 134.119.213.95 controller kube-system kube-controller-manager-controller 1/1 Running 0 6h8m 134.119.213.95 controller kube-system kube-ovn-cni-j99d6 1/1 Running 0 4h46m 146.0.237.30 node01 kube-system kube-ovn-cni-tzmpm 1/1 Running 6 6h7m 134.119.213.95 controller kube-system kube-ovn-controller-96f89c68b-gf876 0/1 Pending 0 3h9m kube-system kube-ovn-controller-96f89c68b-jtfvg 1/1 Terminating 0 3h55m 146.0.237.30 node01 kube-system kube-ovn-controller-96f89c68b-zsw95 1/1 Running 0 6h7m 134.119.213.95 controller kube-system kube-proxy-4gh25 1/1 Running 0 4h46m 146.0.237.30 node01 kube-system kube-proxy-hbb7p 1/1 Running 0 6h8m 134.119.213.95 controller kube-system kube-scheduler-controller 1/1 Running 0 5h4m 134.119.213.95 controller kube-system ovn-central-74986486f9-dsntb 1/1 Running 0 6h7m 134.119.213.95 controller kube-system ovs-ovn-tgfq6 1/1 Running 15 4h46m 146.0.237.30 node01 kube-system ovs-ovn-zrrsj 1/1 Running 14 6h7m 134.119.213.95 controller kubevirt virt-api-f94f8b959-7s8p7 1/1 Terminating 0 3h53m 10.16.0.28 node01 kubevirt virt-api-f94f8b959-jdkhv 0/1 Pending 0 3h9m kubevirt virt-api-f94f8b959-t2997 0/1 Pending 0 3h9m kubevirt virt-api-f94f8b959-z4v8v 1/1 Terminating 0 3h53m 10.16.0.29 node01 kubevirt virt-controller-64766f7cbf-6n9s9 1/1 Terminating 0 3h52m 10.16.0.30 node01 kubevirt virt-controller-64766f7cbf-8hlbg 0/1 Pending 0 3h9m kubevirt virt-controller-64766f7cbf-8vxht 0/1 Pending 0 3h9m kubevirt virt-controller-64766f7cbf-th5dt 1/1 Terminating 0 3h52m 10.16.0.31 node01 kubevirt virt-handler-dkmbb 1/1 Running 0 3h52m 10.16.0.32 node01 kubevirt virt-operator-79c97797-2xrk5 1/1 Terminating 0 5h5m 10.16.0.7 node01 kubevirt virt-operator-79c97797-f85xd 1/1 Terminating 0 5h5m 10.16.0.6 node01 kubevirt virt-operator-79c97797-qhpsx 0/1 Pending 0 3h9m kubevirt virt-operator-79c97797-t6hkl 0/1 Pending 0 3h9m openness docker-registry-deployment-54d5bb5c-vrs6c 1/1 Running 0 5h8m 134.119.213.95 controller openness eaa-6f8b94c9d7-khb59 0/1 Terminating 0 5h7m 10.16.0.4 node01 openness eaa-6f8b94c9d7-qrldb 0/1 Pending 0 3h9m openness edgedns-g2cql 0/1 ErrImageNeverPull 0 3h54m 10.16.0.20 node01 openness interfaceservice-vn9bs 0/1 ErrImageNeverPull 0 3h54m 10.16.0.19 node01 openness nfd-release-node-feature-discovery-master-cdbcfd997-rz4jq 1/1 Running 0 5h2m 10.16.0.16 controller openness nfd-release-node-feature-discovery-worker-ddqhj 1/1 Running 0 3h54m 146.0.237.30 node01 openness syslog-master-c8585 1/1 Running 0 5h7m 10.16.0.5 controller openness syslog-ng-bpnjn 1/1 Running 0 3h54m 10.16.0.21 node01 telemetry cadvisor-5jwj6 2/2 Running 0 3h54m 10.16.0.23 node01 telemetry collectd-7cd5k 2/2 Running 0 3h54m 146.0.237.30 node01 telemetry custom-metrics-apiserver-54699b845f-74ww7 1/1 Running 0 5h4m 10.16.0.13 controller telemetry grafana-6b79c984b-7pb5q 2/2 Running 0 5h1m 10.16.0.17 controller telemetry otel-collector-7d5b75bbdf-8ppm4 0/2 Pending 0 3h9m telemetry otel-collector-7d5b75bbdf-zwv6q 2/2 Terminating 0 5h4m 10.16.0.11 node01 telemetry prometheus-node-exporter-v95cb 1/1 Running 2 3h54m 10.16.0.18 node01 telemetry prometheus-server-76c96b9497-84rnp 3/3 Running 0 5h4m 10.16.0.10 controller telemetry telemetry-aware-scheduling-68467c4ccd-6sr58 2/2 Running 0 5h3m 10.16.0.14 controller telemetry telemetry-node-certs-w6tft 1/1 Running 0 3h54m 10.16.0.22 node01