smart-edge-open / converged-edge-experience-kits

Source code for experience kits with Ansible-based deployment.
Apache License 2.0
37 stars 40 forks source link

Error in deploying SampleApp #37

Closed pavanats closed 4 years ago

pavanats commented 4 years ago

Hi, I was trying to deploy a SampleApp and came across the following error in the producer pod:

Reason: UnexpectedAdmissionError

Message: Pod Allocate failed due to failed to write checkpoint file "kubelet_internal_checkpoint": mkdir /var: file exists, which is unexpected

tomaszwesolowski commented 4 years ago

Hi, Can you please describe steps you took while this error occurred? Was this while building sample app? Thanks

pavanats commented 4 years ago

Hi, Can you please describe steps you took while this error occurred? Was this while building sample app? Thanks

Yes this was while deploying the sample app. I followed the instructions provided on the openness site.

tomaszwesolowski commented 4 years ago

I will try to reproduce this bug and I will get back to you.

tomaszwesolowski commented 4 years ago

Hi, I wasn't able to reproduce error you describe. Are you following this guide?

pavanats commented 4 years ago

Hi, Yes, I have been following the link you shared. I was able to move forward and got to the point to deploy the sampleApp. Now my issue is that producer and consumer pods are in pending state. On further investigation, I see the edge node not ready. I am still debugging this issue. Let know if you know any steps I can try to make the edge node ready.

I have already tried restarting the controller and edge nodes, but to no avail. Pavan


From: tomaszwesolowski notifications@github.com Sent: Monday, July 20, 2020 6:33 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Author author@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Error in deploying SampleApp (#37)

Hi, I wasn't able to reproduce error you describe. Are you following thishttps://github.com/open-ness/specs/blob/master/doc/applications-onboard/network-edge-applications-onboarding.md#deploying-consumer-and-producer-sample-application guide?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/37#issuecomment-661025471, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZCZAW7FWFLGMQZQZ5WTR4Q6AZANCNFSM4O6NCLQA.

tomaszwesolowski commented 4 years ago

Can you provide output from command kubectl describe node node_name? Also you can delete the node from cluster and redeploy it with ./deploy_ne.sh node.

pavanats commented 4 years ago

Hi, Here's the output of 'kubectl describe node node01':

[root@controller ~]# kubectl describe node node01 Name: node01 Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux feature.node.kubernetes.io/cpu-cpuid.AESNI=true feature.node.kubernetes.io/cpu-cpuid.AVX=true feature.node.kubernetes.io/cpu-cpuid.IBPB=true feature.node.kubernetes.io/cpu-cpuid.STIBP=true feature.node.kubernetes.io/cpu-cpuid.VMX=true feature.node.kubernetes.io/cpu-hardware_multithreading=true feature.node.kubernetes.io/iommu-enabled=true feature.node.kubernetes.io/kernel-config.NO_HZ=true feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true feature.node.kubernetes.io/kernel-config.PREEMPT=true feature.node.kubernetes.io/kernel-version.full=3.10.0-1062.12.1.rt56.1042.el7.x86_64 feature.node.kubernetes.io/kernel-version.major=3 feature.node.kubernetes.io/kernel-version.minor=10 feature.node.kubernetes.io/kernel-version.revision=0 feature.node.kubernetes.io/memory-numa=true feature.node.kubernetes.io/network-sriov.capable=true feature.node.kubernetes.io/pci-0300_102b.present=true feature.node.kubernetes.io/system-os_release.ID=centos feature.node.kubernetes.io/system-os_release.VERSION_ID=7 feature.node.kubernetes.io/system-os_release.VERSION_ID.major=7 feature.node.kubernetes.io/system-os_release.VERSION_ID.minor= kubernetes.io/arch=amd64 kubernetes.io/hostname=node01 kubernetes.io/os=linux kubevirt.io/schedulable=true node-role.kubernetes.io/worker=worker Annotations: node.alpha.kubernetes.io/ttl: 0 ovn.kubernetes.io/cidr: 100.64.0.0/16 ovn.kubernetes.io/gateway: 100.64.0.1 ovn.kubernetes.io/ip_address: 100.64.0.3 ovn.kubernetes.io/logical_switch: join ovn.kubernetes.io/mac_address: de:fd:f8:40:00:04 ovn.kubernetes.io/port_name: node-node01 CreationTimestamp: Thu, 16 Jul 2020 13:27:22 +0200 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Lease: HolderIdentity: node01 AcquireTime: RenewTime: Fri, 17 Jul 2020 17:48:06 +0200 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure Unknown Fri, 17 Jul 2020 17:48:06 +0200 Fri, 17 Jul 2020 17:48:46 +0200 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Fri, 17 Jul 2020 17:48:06 +0200 Fri, 17 Jul 2020 17:48:46 +0200 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Fri, 17 Jul 2020 17:48:06 +0200 Fri, 17 Jul 2020 17:48:46 +0200 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Fri, 17 Jul 2020 17:48:06 +0200 Fri, 17 Jul 2020 17:48:46 +0200 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 134.119.205.185 Hostname: node01 Capacity: cpu: 48 devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 51175Mi hugepages-1Gi: 0 hugepages-2Mi: 4Gi memory: 131886780Ki pods: 110 Allocatable: cpu: 47 devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 48294789041 hugepages-1Gi: 0 hugepages-2Mi: 4Gi memory: 127590076Ki pods: 110 System Info: Machine ID: 238da3fd0f5b4a968758a13684c78869 System UUID: 00000000-0000-0000-0000-002590F53B8A Boot ID: 7090118d-2c09-4262-9674-48cb8fe941b9 Kernel Version: 3.10.0-1062.12.1.rt56.1042.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://Unknown Kubelet Version: v1.18.4 Kube-Proxy Version: v1.18.4 Non-terminated Pods: (26 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


cdi cdi-apiserver-885758cc4-cr4ss 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h cdi cdi-deployment-5bdcc85d54-4ns74 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h cdi cdi-operator-76b6694845-zl925 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d6h cdi cdi-uploadproxy-89cf96777-qbfps 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h kube-system descheduler-cronjob-1594999440-s7q6h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d kube-system kube-ovn-cni-7sms5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d4h kube-system kube-ovn-controller-96f89c68b-jbnxj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h kube-system kube-proxy-m9d8g 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d4h kube-system ovs-ovn-vbj99 200m (0%) 1 (2%) 1Gi (0%) 1Gi (0%) 4d4h kubevirt virt-api-f94f8b959-2cjrk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h kubevirt virt-api-f94f8b959-g42xv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h kubevirt virt-controller-64766f7cbf-2xztk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h kubevirt virt-controller-64766f7cbf-k8t4c 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h kubevirt virt-handler-6wcvn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h kubevirt virt-operator-79c97797-jwdtf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d6h kubevirt virt-operator-79c97797-wxsxj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d6h openness eaa-6f8b94c9d7-nvbbs 100m (0%) 1 (2%) 128Mi (0%) 128Mi (0%) 4d6h openness edgedns-kzslw 100m (0%) 1 (2%) 128Mi (0%) 128Mi (0%) 4d3h openness interfaceservice-r42pj 100m (0%) 1 (2%) 128Mi (0%) 128Mi (0%) 4d3h openness nfd-release-node-feature-discovery-worker-bnx9g 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h openness syslog-ng-l289n 100m (0%) 500m (1%) 128Mi (0%) 128Mi (0%) 4d3h telemetry cadvisor-bp5ns 100m (0%) 1 (2%) 2Gi (1%) 2Gi (1%) 4d3h telemetry collectd-dwttf 100m (0%) 1 (2%) 2Gi (1%) 2Gi (1%) 4d3h telemetry otel-collector-7d5b75bbdf-8btkm 200m (0%) 1 (2%) 400Mi (0%) 2Gi (1%) 4d6h telemetry prometheus-node-exporter-fcgnh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d3h telemetry telemetry-node-certs-qqdkh 100m (0%) 100m (0%) 128Mi (0%) 128Mi (0%) 4d3h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits


cpu 1100m (2%) 7600m (16%) memory 6160Mi (4%) 7808Mi (6%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 1Gi (25%) 1Gi (25%) devices.kubevirt.io/kvm 0 0 devices.kubevirt.io/tun 0 0 devices.kubevirt.io/vhost-net 0 0 Events:


From: tomaszwesolowski notifications@github.com Sent: Monday, July 20, 2020 7:26 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Author author@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Error in deploying SampleApp (#37)

Can you provide output from command kubectl describe node node_name? Also you can delete the node from cluster and redeploy it with ./deploy_ne.sh node.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/37#issuecomment-661055369, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC7GOTUYYRE3KS255MDR4REHFANCNFSM4O6NCLQA.

tomaszwesolowski commented 4 years ago

Everything looks fine here. Did you try delete node and deploy it again?

pavanats commented 4 years ago

Hi, I am in process of doing that. Will update you once done. Pavan


From: tomaszwesolowski notifications@github.com Sent: Tuesday, July 21, 2020 12:41 PM To: open-ness/openness-experience-kits openness-experience-kits@noreply.github.com Cc: Pavan Gupta pavan.gupta@atsgen.com; Author author@noreply.github.com Subject: Re: [open-ness/openness-experience-kits] Error in deploying SampleApp (#37)

Everything looks fine here. Did you try delete node and deploy it again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/37#issuecomment-661678712, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APSLZC5WIYDBNEQCD2WW7ELR4U5RPANCNFSM4O6NCLQA.

pavanats commented 4 years ago

Hi, I could finally manage to deploy both the controller and the network edge node on physical machines. For the network edge, I had to make few retries and came across following issues. In general, I think it will help to mention steps that users can take when they come across common failure issues.

ailed: [controller] (item=ovn-central) => {

"ansible_loop_var": "item",
"changed": false,
"cmd": "set -o pipefail && kubectl logs -n kube-system $(kubectl get pods -n kube-system -o custom-columns=NAME:.metadata.name | grep ovn-central)\n",
"delta": "0:00:00.225693",
"end": "2020-07-21 10:46:18.626494",
"item": "ovn-central",
"rc": 1,
"start": "2020-07-21 10:46:18.400801"

}

STDERR:

Error from server (BadRequest): container "ovn-central" in pod "ovn-central-74986486f9-fvq5z" is waiting to start: ContainerCreating

================================================================================

failed: [controller] (item=node01) => {

"ansible_loop_var": "item",
"attempts": 20,
"changed": true,
"cmd": [
    "ovn-nbctl",
    "--may-exist",
    "lsp-add",
    "node01-local",
    "node01-ovs-phy"
],
"delta": "0:00:00.010396",
"end": "2020-07-21 11:07:41.912971",
"item": "node01",
"rc": 1,
"start": "2020-07-21 11:07:41.902575"

}

STDERR:

ovn-nbctl: node01-local: switch name not found