Closed sriramec closed 3 years ago
Hi @sriramec,
First issue is that danm-installer
pod is in CrashLoopBackOff
.
Please also include the output of kubectl describe
for danm-installer-*
pod, we may find out from it why pod cannot run.
Hi emGabriel,
Please find the output of
"kubectl describe pod danm-installer-48847 -n kube-system"
root@master-node:/etc/cni/net.d# kubectl describe pod danm-installer-48847 -n kube-system
Name: danm-installer-48847
Namespace: kube-system
Priority: 0
Node: worker01/192.168.56.9
Start Time: Thu, 08 Oct 2020 00:15:56 +0530
Labels: controller-uid=635b12fa-9ed0-4f0a-a8c8-02202bbae6a3
job-name=danm-installer
Annotations: cni.projectcalico.org/podIP: 172.17.5.19/32
cni.projectcalico.org/podIPs: 172.17.5.19/32
Status: Running
IP: 172.17.5.19
IPs:
IP: 172.17.5.19
Controlled By: Job/danm-installer
Containers:
danm-installer:
Container ID: docker://188a3615b98c2a6d2aa35de3f3b4779c6041eebe9ba090c3c34523c90ad4b662
Image: danm-installer:latest
Image ID: docker://sha256:76de65eb0ab0a1f5b54da9c4eb6a71b66c36591f1936e9f7109218ec8ce465de
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 09 Oct 2020 10:27:18 +0530
Finished: Fri, 09 Oct 2020 10:27:19 +0530
Ready: False
Restart Count: 404
Environment: <none>
Mounts:
/config from danm-installer-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from danm-installer-token-krvjv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
danm-installer-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: danm-installer-config
Optional: false
danm-installer-token-krvjv:
Type: Secret (a volume populated by a Secret)
SecretName: danm-installer-token-krvjv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 42m (x397 over 34h) kubelet, worker01 Container image "danm-installer:latest" already present on machine
Warning BackOff 2m26s (x9433 over 34h) kubelet, worker01 Back-off restarting failed container
root@master-node:/etc/cni/net.d#
Is there anything that I m missing here ? Let me know if logs are required.
Regards, Sriram
Please also include a 'previous' log:
kubectl logs --previous danm-installer-<genid> -f -n kube-system
It has a weak chance of containing useful information, we shall see.
An idea came into my mind. Modify or rebuild the danm-installer image with the following modification (choose option which is easier for you)
cat <<EOF | kubectl apply -f -
And rebuild danm-builder image
# Start a container with sed command applying the fix
docker run --name installer-fix --entrypoint sed danmcni/danm-installer -i 's/kubectl create -f/kubectl apply -f/' /integration/manifests/webhook/webhook-create-signed-cert.sh
# Commit the modified container into an image named: 'danm-installer:fix'
docker commit installer-fix danm-installer:fix
# Validate that changed line exists in new image
docker run --rm -t --entrypoint grep danm-installer:fix 'kubectl apply -f' /integration/manifests/webhook/webhook-create-signed-cert.sh
# it should output:
# cat <<EOF | kubectl apply -f -
Reset your setup, remove leftover danm kube-objects
Modify the danm-installer kubernetes manifest:
https://github.com/nokia/danm/blob/abd3c48d39f5441ce0d66daac63e8f8772c1a348/integration/install/danm-installer.yaml#L18
change image danm-installer:latest
to `danm-installer:fix
This modification may solve
Error from server (AlreadyExists): error when creating "STDIN": certificatesigningrequests.certificates.k8s.io "danm-webhook-svc.kube-system" already exists
I think I hit the same problem with k8s 1.19.2. The installer pod generates this weird error.
error: no kind "CertificateSigningRequest" is registered for version "certificates.k8s.io/v1" in scheme "k8s.io/kubectl/pkg/scheme/scheme.go:28"
The pod uses kubectl 1.17.4, which violates the k8s version skew policy. Changing KUBECTL_VERSION in scm/build/Dockerfile.install to 1.18.9 did the trick.
if that solves the problem we might need to bump our dependencies
Thanks everyone for the suggestions. In scm/build/Dockerfile.install I set the kubectl version to 1.19.1 since k8s version in my setup was 1.19.2, it is working fine now.
Is this a BUG REPORT or FEATURE REQUEST?:
What happened: Danm installation through installer not going through
What you expected to happen: Danm installation should succeed using installer job
How to reproduce it: Modify the danm-installer-config.yaml as per the bootstrap CNI, I have mentioned calico. Make sure to have /etc/cni/net.d/calico.conf in all the nodes of the cluster. Install danm using installer. Installer job is crashing.
These are the logs that I see. Sine the pod is restarted repeatedly, logs say resource exists. But I had cleaned up everything before doing installation
I see this csr being present
Its not getting approved. All the images required for danm installation are present in the cluster. Is there anything I m missing ? please suggest.
Anything else we need to know?:
Environment:
DANM version (use
danm -version
):Kubernetes version (use
kubectl version
):DANM configuration (K8s manifests, kubeconfig files, CNI config file): root@master-node:~/sriram/7-10-2020/danm# cat integration/install/danm-installer-config.yaml
OS (e.g. from /etc/os-release):
Kernel (e.g.
uname -a
):Others: