Closed pavanats closed 4 years ago
Hi @pavanats, Can you clarify what steps did you use to bring up the cluster? Also, are you deploying the nodes on bare-metal or inside VMs? Did you attempt that installation on freshly installed OS?
Following are the steps:
I havent been really able to bring up a stable working setup so far, though once I could get the K8S dashboard up. Generally, some pod or ther other is in crashloopbackoff state.
Hi Amr, I have shared the information on the community forum. I am presently stuck with the errors and would appreciate if you could help resolve those. Regards, Pavan
From: Amr Mokhtar notifications@github.com Sent: Monday, July 6, 2020 10:53 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Mention mention@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Errors in deploying Intel OpenNESS platform on VMs (#30)
Hi @pavanatshttps://github.com/pavanats, Can you clarify what steps did you use to bring up the cluster? Also, are you deploying the nodes on bare-metal or inside VMs? Did you attempt that installation on freshly installed OS?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/30#issuecomment-654366797, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC5NO6Q2JZDPCE7QS7DR2ICB5ANCNFSM4ORXSBXA.
Sorry, I am little confused. Are you deploying OpenNESS worker nodes as VMs then deploying VM-based apps within? Can you try to install without kube-virt enabled? For this you can try the minimal flavor deployment.
Hi Amr, VM1 is the controller VM. Here I also download openness experience kit and run ./deploy_ne.sh controller. VM2 is the edge node VM. From VM1, I run deploy_ne.sh node to run the ansible script for the edge node.
My eventual goal is to deploy VM based VNF using Kubevirt. I am trying again on a fresh setup. Will let you know if I still see the failures. Pavan
From: Amr Mokhtar notifications@github.com Sent: Tuesday, July 7, 2020 7:56 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Mention mention@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Errors in deploying Intel OpenNESS platform on VMs (#30)
Sorry, I am little confused. Are you deploying OpenNESS worker nodes as VMs then deploying VM-based apps within? Can you try to install without kube-virt enabled? For this you can try the minimal flavorhttps://github.com/open-ness/specs/blob/master/doc/flavors.md#minimal-flavor deployment.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/30#issuecomment-654901003, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC3434WYMQC4CDSPANLR2MWALANCNFSM4ORXSBXA.
I am not sure if that is a possible case, running a VM in a VM. Probably, you may consider changing the VNF into a CNF (Container Network Function) - that should work.
Hi Amr,
We will try a deployment without kubevirt to see if that is causing the issue. Eventually though we do need support for kubevirt as we expect some workloads to run as VNFs. For that we can move to bare metal.
Adding a few data points on what the event logs suggest.
Only pods stuck in CrashLoop are: cdi-operator virt-operator kubernetes-dashboard
It is consistently these 3 pod types that are getting into crash loop. What the event's suggest: They all seem to be caused by the same issue. All these pods need to volumemount some tokens/secrets that are stored as native k8s secrets. This they are failing to do.
The event logs look like this:
Warning FailedMount 5m17s (x2 over 5m19s) kubelet, node01 MountVolume.SetUp failed for volume "kubevirt-operator-token-9f87j" : failed to sync secret cache: timed out waiting for the condition
Needed by:
Volumes:
kubevirt-operator-token-9f87j:
Type: Secret (a volume populated by a Secret)
SecretName: kubevirt-operator-token-9f87j
Optional: false
So I checked if these secrets are non existent, but found they are available:
[root@controller01 ~]# kubectl get secrets -A | grep -i virt
kubevirt default-token-mck9p kubernetes.io/service-account-token 3 31h
kubevirt kubevirt-operator-token-9f87j kubernetes.io/service-account-token 3 31h
We will be willing to move to bare metal, but before that would prefer to verify if the cause is really incapability of running VMs within VMs. Because right now the event logs suggest issues with mounting secrets in the pods as volumes.
Thanks, Surajit
On first look this look like a connectivity issue, for some reason kubelet seems not to be able to connect to the secret. Is this setup running behind any proxy, which CNI is being used?
Hi, We are running the setup on 2 VMs - one for the controller and another for the edge node. There is no proxy involved with both VMs running on the same physical host. Thank you. Pavan
From: damiankopyto notifications@github.com Sent: Thursday, July 9, 2020 6:37 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Mention mention@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Errors in deploying Intel OpenNESS platform on VMs (#30)
On first look this look like a connectivity issue, for some reason kubelet seems not to be able to connect to the service. Is this setup running behind any proxy, which CNI is being used?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/30#issuecomment-656116377, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZCZVKJQVJIFJJCYHQKTR2W6KRANCNFSM4ORXSBXA.
Can you provide full log for the operator "kubectl describe pod virt-operator-xxxxxxxxxx -n kubevirt"
Hi, Please see the attached file for the requested logs. Pavan
From: damiankopyto notifications@github.com Sent: Thursday, July 9, 2020 6:58 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Mention mention@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Errors in deploying Intel OpenNESS platform on VMs (#30)
Can you provide full log for the operator "kubectl describe pod virt-operator-xxxxxxxxxx -n kubevirt"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/30#issuecomment-656127207, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC3DCK6K7JKUJLX2B7TR2XAWPANCNFSM4ORXSBXA.
Hi @pavanats I cannot find the attachment.
HI Damian, I have pasted the cli output below for different cmds. Pavan
[root@controller ~]# kubectl get pods -o wide -A
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cdi cdi-operator-76b6694845-vslzm 0/1 CrashLoopBackOff 222 46h 10.16.0.9 node01
Warning Unhealthy 43m (x593 over 20h) kubelet, node01 Readiness probe failed: Get https://10.16.0.6:8443/metrics: dial tcp 10.16.0.6:8443: connect: connection refused
Warning BackOff 3m5s (x5113 over 20h) kubelet, node01 Back-off restarting failed container
kubectl describe pod virt-operator-79c97797-9rfsh -n kubevirt
Name: virt-operator-79c97797-9rfsh
Namespace: kubevirt
Priority: 0
Node: node01/192.168.122.94
Start Time: Wed, 08 Jul 2020 14:25:59 -0400
Labels: kubevirt.io=virt-operator
pod-template-hash=79c97797
prometheus.kubevirt.io=
Annotations: ovn.kubernetes.io/allocated: true
ovn.kubernetes.io/cidr: 10.16.0.0/16
ovn.kubernetes.io/gateway: 10.16.0.1
ovn.kubernetes.io/ip_address: 10.16.0.7
ovn.kubernetes.io/logical_switch: ovn-default
ovn.kubernetes.io/mac_address: 32:68:5e:10:00:08
scheduler.alpha.kubernetes.io/critical-pod:
scheduler.alpha.kubernetes.io/tolerations: [{"key":"CriticalAddonsOnly","operator":"Exists"}]
Status: Running
IP: 10.16.0.7
IPs:
IP: 10.16.0.7
Controlled By: ReplicaSet/virt-operator-79c97797
Containers:
virt-operator:
Container ID: docker://0fdcb806515c8dd92bbaad4ab51b6d7a8580529815956591e6d0d1c7ab130182
Image: index.docker.io/kubevirt/virt-operator@sha256:4537e45d8f09d52ce202d53b368f34ab6744c06c11519f5219457a339355259e
Image ID: docker-pullable://kubevirt/virt-operator@sha256:4537e45d8f09d52ce202d53b368f34ab6744c06c11519f5219457a339355259e
Ports: 8443/TCP, 8444/TCP
Host Ports: 0/TCP, 0/TCP
Command:
virt-operator
--port
8443
-v
2
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 09 Jul 2020 11:01:07 -0400
Finished: Thu, 09 Jul 2020 11:01:38 -0400
Ready: False
Restart Count: 222
Readiness: http-get https://:8443/metrics delay=5s timeout=10s period=10s #success=1 #failure=3
Environment:
OPERATOR_IMAGE: index.docker.io/kubevirt/virt-operator@sha256:4537e45d8f09d52ce202d53b368f34ab6744c06c11519f5219457a339355259e
WATCH_NAMESPACE: (v1:metadata.annotations['olm.targetNamespaces'])
KUBEVIRT_VERSION: v0.26.0
VIRT_API_SHASUM: sha256:26f1d7c255eefa7fa56dec2923efcdafd522d15a8fee7dff956c9f96f2752f47
VIRT_CONTROLLER_SHASUM: sha256:1ab2afac91c890be4518bbc5cfa3d66526e2f08032648b4557b2abb86eb369a3
VIRT_HANDLER_SHASUM: sha256:0609eb3ea5711ae6290c178275c7d09116685851caa58a8f231277d11224e3d8
VIRT_LAUNCHER_SHASUM: sha256:66d6a5ce83d4340bb1c662198668081b3a1a37f39adc8ae4eb8f6c744fcae0fd
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kubevirt-operator-token-5dr4h (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubevirt-operator-token-5dr4h:
Type: Secret (a volume populated by a Secret)
SecretName: kubevirt-operator-token-5dr4h
Optional: false
QoS Class: BestEffort
Node-Selectors:
Warning BackOff 6m22s (x5099 over 20h) kubelet, node01 Back-off restarting failed container Warning Unhealthy 88s (x625 over 20h) kubelet, node01 Readiness probe failed: Get https://10.16.0.7:8443/metrics: dial tcp 10.16.0.7:8443: connect: connection refused
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From: damiankopyto notifications@github.com Sent: Thursday, July 9, 2020 9:37 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Mention mention@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Errors in deploying Intel OpenNESS platform on VMs (#30)
Hi @pavanatshttps://github.com/pavanats I cannot find the attachment.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/30#issuecomment-656215858, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC5WULQUEHWPNNZLBQTR2XTM7ANCNFSM4ORXSBXA.
Closing this issue as stale. If problem still occurs please open new issue.
Hi, I have created a controller and edge node setup using 2 VMs. I am unable to get a proper running setup due to the following issues seen:
1. CrashLoopBackOff errors: Following pods are always failing.
[root@controller01 ~]# kubectl get pod -A -o wide| grep Crash
cdi cdi-operator-76b6694845-hvcvw 0/1 CrashLoopBackOff 23 9h 10.16.0.4 node01
kubernetes-dashboard kubernetes-dashboard-7bfbb48676-6g7l4 0/1 CrashLoopBackOff 15 57m 10.16.0.8 node01
kubevirt virt-operator-79c97797-qzctm 0/1 CrashLoopBackOff 11 23m 10.16.0.17 node01
2. Some warning/error msgs I see are: Warning BackOff 14m (x31 over 23m) kubelet, node01 Back-off restarting failed container Warning Unhealthy 9m22s (x20 over 24m) kubelet, node01 Readiness probe failed: Get https://10.16.0.17:8443/metrics: dial tcp 10.16.0.17:8443: connect: connection refused Warning FailedMount 5m17s (x2 over 5m19s) kubelet, node01 MountVolume.SetUp failed for volume "kubevirt-operator-token-9f87j" : failed to sync secret cache: timed out waiting for the condition
Warning FailedMount 12m (x2 over 12m) kubelet, node01 MountVolume.SetUp failed for volume "kubernetes-dashboard-certs" : failed to sync secret cache: timed out waiting for the condition Warning FailedMount 12m (x2 over 12m) kubelet, node01 MountVolume.SetUp failed for volume "kubernetes-dashboard-token-2jj6c" : failed to sync secret cache: timed out waiting for the condition
I am interested in finally deploying VMs with OpenNESS but presently can't even deploy a SampleApp. Certainly, would appreciate if any pointers can be provided.
Thank you. Pavan