Danm installation through installer not going through

sriramec commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

bug feature

What happened: Danm installation through installer not going through

What you expected to happen: Danm installation should succeed using installer job

How to reproduce it: Modify the danm-installer-config.yaml as per the bootstrap CNI, I have mentioned calico. Make sure to have /etc/cni/net.d/calico.conf in all the nodes of the cluster. Install danm using installer. Installer job is crashing.

root@master-node:~/sriram/7-10-2020/danm# kubectl apply -f integration/install
serviceaccount/danm-installer created
clusterrole.rbac.authorization.k8s.io/caas:danm-installer created
clusterrolebinding.rbac.authorization.k8s.io/caas:danm-installer created
configmap/danm-installer-config created
job.batch/danm-installer created

root@master-node:~/sriram/7-10-2020/danm# kubectl get pods -n kube-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-kube-controllers-675b7c9569-mq98v   1/1     Running            0          5h26m
calico-node-48zx2                          1/1     Running            0          5h25m
calico-node-pb552                          1/1     Running            0          5h26m
coredns-f9fd979d6-8qmb9                    1/1     Running            0          5h28m
coredns-f9fd979d6-lmmvn                    1/1     Running            0          5h28m
**danm-installer-bk5wn                       0/1     CrashLoopBackOff   7          13m**
etcd-master-node                           1/1     Running            0          5h28m

root@master-node:/etc/cni/net.d# pwd
/etc/cni/net.d
root@master-node:/etc/cni/net.d# ls
calico.conf  calico-kubeconfig

root@worker01:/etc/cni/net.d# pwd
/etc/cni/net.d
root@worker01:/etc/cni/net.d# ls
calico.conf  calico-kubeconfig
root@worker01:/etc/cni/net.d#

These are the logs that I see. Sine the pod is restarted repeatedly, logs say resource exists. But I had cleaned up everything before doing installation

root@master-node:~/sriram/7-10-2020/danm/integration/install# kubectl logs danm-installer-bk5wn -f -n kube-system
Not using any image registry prefix
Not using any image tag
Not using any image pull secret

Reading Kubernetes API server certificate

Applying CRDs to extend Kubernetes API...
customresourcedefinition.apiextensions.k8s.io/clusternetworks.danm.k8s.io unchanged
customresourcedefinition.apiextensions.k8s.io/danmeps.danm.k8s.io unchanged
customresourcedefinition.apiextensions.k8s.io/tenantconfigs.danm.k8s.io unchanged
customresourcedefinition.apiextensions.k8s.io/tenantnetworks.danm.k8s.io unchanged

Creating Service Account
Error from server (AlreadyExists): serviceaccounts "danm" already exists
clusterrole.rbac.authorization.k8s.io/caas:danm unchanged
clusterrolebinding.rbac.authorization.k8s.io/caas:danm unchanged

Creating WebHook certificate...
creating certs in tmpdir /tmp/tmp.ooBNpd
Generating RSA private key, 2048 bit long modulus (2 primes)
..............................................+++++
...+++++
e is 65537 (0x010001)
Error from server (AlreadyExists): error when creating "STDIN": certificatesigningrequests.certificates.k8s.io "danm-webhook-svc.kube-system" already exists

I see this csr being present

root@master-node:~/sriram/7-10-2020/danm# kubectl get csr -n kube-system
NAME                           AGE   SIGNERNAME                     REQUESTOR                                          CONDITION
danm-webhook-svc.kube-system   10m   kubernetes.io/legacy-unknown   system:serviceaccount:kube-system:danm-installer   Pending

Its not getting approved. All the images required for danm installation are present in the cluster. Is there anything I m missing ? please suggest.

Anything else we need to know?:

Environment:

DANM version (use danm -version):

root@master-node:~/sriram/7-10-2020/danm# /opt/cni/bin/danm --version
2020/10/07 23:56:14 DANM binary was built from release: v4.2.0-3-g6454a2c
2020/10/07 23:56:14 DANM binary was built from commit: 6454a2c5

Kubernetes version (use kubectl version):

root@master-node:~/sriram/7-10-2020/danm# kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aabd17e", GitTreeState:"clean", BuildDate:"2020-09-09T11:26:42Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", BuildDate:"2020-09-16T13:32:58Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}

DANM configuration (K8s manifests, kubeconfig files, CNI config file): root@master-node:~/sriram/7-10-2020/danm# cat integration/install/danm-installer-config.yaml

---
apiVersion: v1
kind: ConfigMap
metadata:
namespace: kube-system
name: danm-installer-config
data:
#
# DANM deployment mode. This MUST be either "lightweight" or "production". Please see the DANM user guide for
# details on the two modes.
#
deploy_mode: production

#
# CNI configuration directory. Typically, this is "/etc/cni/net.d". This is the directory where your
# current (bootstrap) CNI configuration is located, too.
#
cni_dir: /etc/cni/net.d

#
# CNI naming scheme. See section "Naming container interfaces" in the user guide for a more detailed
# discussion. Set this parameter to "legacy" if you wish container interface names to be set exactly
# according to DanmNet.Spec.Options.container_prefix, or to an empty string if you wish the first
# interface to always be named "eth0".
#
cni_naming_scheme: "legacy"

#
# [OPTIONAL] Kubernetes API Root CA certificate. If left blank, the installer will obtain the
# API server certificate from the Kubernetes API. Note, however, that placing a certificate here
# is technically more secure (as it provides external verification of the CA certificate, rather
# than blindly trusting the certificate that we see from the server) and also more future-proof
# if the individual API server's certificate ever were to change in the future, to put the Root
# CA certificate here. You can obtain this, for example, by running:
#
# kubectl config view \
#   --flatten \
#   -o jsonpath='{.clusters[0].cluster.certificate-authority-data} \
#   | base64 -d
#
#  api_ca_cert: |
#    -----BEGIN CERTIFICATE-----
#    VGhpcyBJcyBBIFBsYWNlaG9sZGVyIFN0cmluZy4uLiBUaGlzIElzIEEgUGxhY2Vo
#    ...
#    VGhpcyBpcyB0aGUgZW5kIG9mIGEgcGxhY2Vob2xkZXIgc3RyaW5nCg==
#    -----END CERTIFICATE-----

#
# This is the type of the CNI plugin used for your default (bootstrap) network.
# The value can also be found in the "type" field of your bootstrap CNI configuration
# file, eg. "cat ${cni_dir}/${default_cni_network_id}.conf | jq -Mr '.type'"
#
default_cni_type: calico

#
# The name of your bootstrap CNI configuration file, without the `.conf` extension.
# This means that on each node in your cluster, in the ${cni_dir} directory, a file
# with the name of "${default_cni_network_id}.conf" must exist. Alternatively,
# a file with this name will be created if the ${default_cni_config_data} parameter
# is also provided (below).
#
default_cni_network_id: calico

#
# [OPTIONAL] Bootstrap CNI configuration data. Typically, your bootstrap CNI plugin
# should already be configured, so using this option should not be necessary. However,
# there may be situations where using this option may be useful to distribute an
# alternative configuration file for your bootstrap CNI plugin. If provided, the
# contents of this variable are going to be written to a file named
# "${default_cni_network_id}.conf" as above.
#
#  default_cni_config_data: |
#    {
#      "cniVersion": "0.3.1",
#      "type": "flannel",
#      "delegate": {
#        "hairpinMode": true,
#        "isDefaultGateway": true
#      }
#    }

#
# [OPTIONAL] A prefix (such as a registry name) to be included in each container image.
# Note that this can be any prefix you like, but if it is a registry name, then
# the value specified here needs to include the trailing slash.
#
# For example, if you wish to pull your "netwatcher" image from
# "my-registry.example.com/namespace/netwatcher", then set this value to
# "my-registry.example.com/namespace/". The same prefix will be applied for
# all images.
#
#  image_registry_prefix: my-registry.example.com/namespace/

#
# [OPTIONAL] Image tag for each image. Defaults to "latest" if none specified.
#
#  image_tag: latest

#
# [OPTIONAL] If your registry needs authentication, then this is the name
# of a Kubernetes secret with registry credentials. This secret must already
# exist and is not created by the installer.
#
#  image_pull_secret: my-registry-secret

#
# Image Pull policy. Can be "Always", "Never", or "IfNotPresent".
#
image_pull_policy: IfNotPresent

OS (e.g. from /etc/os-release):

root@master-node:~/sriram/7-10-2020/danm# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.7 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Kernel (e.g. uname -a):

root@master-node:~/sriram/7-10-2020/danm# uname -a
Linux master-node 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Others:

eMGabriel commented 4 years ago

Hi @sriramec,

First issue is that danm-installer pod is in CrashLoopBackOff. Please also include the output of kubectl describe for danm-installer-* pod, we may find out from it why pod cannot run.

sriramec commented 4 years ago

Hi emGabriel,

Please find the output of

"kubectl describe pod danm-installer-48847 -n kube-system"

root@master-node:/etc/cni/net.d# kubectl describe pod danm-installer-48847 -n kube-system
Name:         danm-installer-48847
Namespace:    kube-system
Priority:     0
Node:         worker01/192.168.56.9
Start Time:   Thu, 08 Oct 2020 00:15:56 +0530
Labels:       controller-uid=635b12fa-9ed0-4f0a-a8c8-02202bbae6a3
              job-name=danm-installer
Annotations:  cni.projectcalico.org/podIP: 172.17.5.19/32
              cni.projectcalico.org/podIPs: 172.17.5.19/32
Status:       Running
IP:           172.17.5.19
IPs:
  IP:           172.17.5.19
Controlled By:  Job/danm-installer
Containers:
  danm-installer:
    Container ID:   docker://188a3615b98c2a6d2aa35de3f3b4779c6041eebe9ba090c3c34523c90ad4b662
    Image:          danm-installer:latest
    Image ID:       docker://sha256:76de65eb0ab0a1f5b54da9c4eb6a71b66c36591f1936e9f7109218ec8ce465de
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 09 Oct 2020 10:27:18 +0530
      Finished:     Fri, 09 Oct 2020 10:27:19 +0530
    Ready:          False
    Restart Count:  404
    Environment:    <none>
    Mounts:
      /config from danm-installer-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from danm-installer-token-krvjv (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  danm-installer-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      danm-installer-config
    Optional:  false
  danm-installer-token-krvjv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  danm-installer-token-krvjv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                     From               Message
  ----     ------   ----                    ----               -------
  Normal   Pulled   42m (x397 over 34h)     kubelet, worker01  Container image "danm-installer:latest" already present on machine
  Warning  BackOff  2m26s (x9433 over 34h)  kubelet, worker01  Back-off restarting failed container
root@master-node:/etc/cni/net.d#

sriramec commented 4 years ago

Is there anything that I m missing here ? Let me know if logs are required.

Regards, Sriram

eMGabriel commented 4 years ago

Please also include a 'previous' log: kubectl logs --previous danm-installer-<genid> -f -n kube-system It has a weak chance of containing useful information, we shall see.

An idea came into my mind. Modify or rebuild the danm-installer image with the following modification (choose option which is easier for you)

Option 1: Change this line: https://github.com/nokia/danm/blob/abd3c48d39f5441ce0d66daac63e8f8772c1a348/integration/manifests/webhook/webhook-create-signed-cert.sh#L80 to this:
```
cat <<EOF | kubectl apply -f -
```
And rebuild danm-builder image

Option 2: Modify existing image and commit the change into a new tag:

# Start a container with sed command applying the fix
docker run --name installer-fix --entrypoint sed danmcni/danm-installer -i 's/kubectl create -f/kubectl apply -f/' /integration/manifests/webhook/webhook-create-signed-cert.sh
# Commit the modified container into an image named: 'danm-installer:fix'
docker commit installer-fix danm-installer:fix
# Validate that changed line exists in new image
docker run --rm -t --entrypoint grep danm-installer:fix 'kubectl apply -f' /integration/manifests/webhook/webhook-create-signed-cert.sh
# it should output:
# cat <<EOF | kubectl apply -f -

Reset your setup, remove leftover danm kube-objects Modify the danm-installer kubernetes manifest: https://github.com/nokia/danm/blob/abd3c48d39f5441ce0d66daac63e8f8772c1a348/integration/install/danm-installer.yaml#L18 change image danm-installer:latest to `danm-installer:fix

This modification may solve Error from server (AlreadyExists): error when creating "STDIN": certificatesigningrequests.certificates.k8s.io "danm-webhook-svc.kube-system" already exists

toshiiw commented 4 years ago

I think I hit the same problem with k8s 1.19.2. The installer pod generates this weird error.

error: no kind "CertificateSigningRequest" is registered for version "certificates.k8s.io/v1" in scheme "k8s.io/kubectl/pkg/scheme/scheme.go:28"

The pod uses kubectl 1.17.4, which violates the k8s version skew policy. Changing KUBECTL_VERSION in scm/build/Dockerfile.install to 1.18.9 did the trick.

Levovar commented 4 years ago

if that solves the problem we might need to bump our dependencies

sriramec commented 4 years ago

Thanks everyone for the suggestions. In scm/build/Dockerfile.install I set the kubectl version to 1.19.1 since k8s version in my setup was 1.19.2, it is working fine now.

nokia / danm

Danm installation through installer not going through #238