Only adding private repository to pod resource cause the deployment resource create pod failed from pod template.

lyyao09 commented 5 years ago

First, deploying kubenab in the kubernetes cluster can solve the problem that the image of the yaml file does not have a private repository. eg:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-pod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: deployment
  template:
    metadata:
      labels:
        app: deployment
    spec:
      containers:
      - image: busybox（kubenab will add private repository automatically when create yaml）
        name: busybox

However, I found that there is a problem in the case of power off the node.

Suppose test-pod first scheduled on node1(kubenab is also scheduled on the node1). If node1 is powered off, test-pod and kubenab will migrate to other nodes after a while.

If the test-pod migrate first and kubenab is not ready, it will pull the image busybox:latest from default docker.io repository, and then the pod become ErrorImagePull（I have to use kubectl delete -f xxx.yaml cmd and kubectl create -f xxx.yaml after kubenab is ready to avoid this problem）.

kubectl describe pod deployment-test-pod-567d7cf9bf-xdfvg
Events:
  Type     Reason          Age               From               Message
  ----     ------          ----              ----               -------
  Normal   Scheduled       20s               default-scheduler  Successfully assigned default/deployment-test-pod-567d7cf9bf-xdfvg to node2
  Normal   Pulling         13s               kubelet, node1     Pulling image "busybox:latest"
  Warning  Failed          13s               kubelet, node1     Failed to pull image "busybox:latest": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:35116->[::1]:53: read: connection refused
  Warning  Failed          13s               kubelet, node1     Error: ErrImagePull

From kubenab/cmd/kubenab/admissions.go, kubenab only add repository to Pod resource, when pod created by deployment or statefulset's template, the above problem will occur because there is no private repository added to deployment or statefulset resource.

if !contains(whitelistedNamespaces, namespace) {
  pod := v1.Pod{}

Please confirm if there is any problem or my usage is wrong?（use official example）

l0nax commented 5 years ago

This issue could be solved by setting the Pod Priority to system-cluster-critical since kubenab is relevant that the Cluster can work correctly.

l0nax commented 5 years ago

I created a Pull-Request (#18) which targets your problem and solves.

I wasn't able to reproduce your problem with that patch applied.

l0nax commented 5 years ago

Please note that this Patch (#18) does not implement Fail-Over/High-Availability – it only sets the Scheduling Priority to the Highest possible Value so the kubenab Pod would be scheduled above all else Pods! This is targeted by PR #12 (which is currently Work in Progress).

But you could also simply increase the Replica Count of the kubenab deployment and then add a nodeSelector to prevent scheduling all the pods on the same Node.

lyyao09 commented 5 years ago

@l0nax, Thank you for your prompt reply.

According to the method you provided and found that the problem still exists when abnormal power off node.

I think the root cause is that only the image name in the pod object（Kind: Pod） is modified and persisted（Not include deployment or statefuelset）.

K8s version:

[root@node2 01-volume-emptyDir]# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

l0nax commented 5 years ago

You are right. The MutatingAdmissionController gets only called on CREATE Operations (see https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/mutating-webhook.yaml#L16).

I will push a change so also UPDATE Operations will be covered by the Admission Controller.

Thanks for your futture Tests @lyyao09

EDIT: Adding UPDATE Operation to the Admission Controller could also prevent Issues if a user (eg.) changes only the Image Version of an Pod.

l0nax commented 5 years ago

@lyyao09 can you re-open the Issue please?

l0nax commented 5 years ago

And your Issue would be fixed via Fail-Over (whitch targets the PR #12) this would prevent issues of a Node gets a abnormal power off.

Editing Deployments/ StatefSets could be done, but they will create Pods and you could also create Pods without a Deployment or StatefulSet.

So it would sub-optimal to change them. A pod is the "lowest" entity which will be created.

lyyao09 commented 5 years ago

You are right. The MutatingAdmissionController gets only called on CREATE Operations (see https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/mutating-webhook.yaml#L16).

I will push a change so also UPDATE Operations will be covered by the Admission Controller.

Thanks for your futture Tests @lyyao09

EDIT: Adding UPDATE Operation to the Admission Controller could also prevent Issues if a user (eg.) changes only the Image Version of an Pod.

yes, we should add UPDATE operation as you say. However, this can't solve my problem.

l0nax commented 5 years ago

As said in https://github.com/jfrog/kubenab/issues/17#issuecomment-539941252 Fail-Over/ Replication is the only way to fix this problem directly.

And priority Classes are only to set the Scheduling Priority above all the other Pods. But kubernetes has a "delay" after which kubernetes marks a Nice as Unschedulable and NotRead

lyyao09 commented 5 years ago

@l0nax, I simply increase the Replica Count of the kubenab deployment and then add a podAntiAffinity to prevent scheduling all the pods on the same Node.

Repeated abnormal power off node test, it works well.

l0nax commented 5 years ago

How does your podAffinity look like?

If you add the Node-Names hard-coded you prevent Kubernetes to dynamically schedule the Pods

l0nax commented 5 years ago

as far as I know kubernetes doesn't have a feature where you can say "deploy max 2 pods per node".

You could implement this feature by building a Operator which schedules the Pods dynamically.

lyyao09 commented 5 years ago

How does your podAffinity look like?

If you add the Node-Names hard-coded you prevent Kubernetes to dynamically schedule the Pods

I make kubenab scheduled to 3 master node, and each node only have one kubenab pod.

      ...
      nodeSelector:
        role: master
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - kubenab
            topologyKey: kubernetes.io/hostname  
      ...

l0nax commented 5 years ago

@lyyao09 Sorry for the late answer. I didn't know about how to use podAntiAffinity for schedule only per node.

I will make a Change to implement that Feature - after understanding how really it works.

Thanks!

lyyao09 commented 5 years ago

@l0nax Thank you for your enthusiastic answer.

rimusz commented 5 years ago

@l0nax regarding podAntiAffinity it is nothing to implement it is already there: https://github.com/jfrog/kubenab/blob/master/chart/kubenab/values.yaml#L132 https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/deployment.yaml#L57 :) one of examples of adding it to override-values.yaml file:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: helm.sh/chart
          operator: In
          values:
          - kubenab
      topologyKey: "kubernetes.io/hostname"

any of these labels can be used for podAntiAffinity

l0nax commented 5 years ago

@rimusz oh sorry I haven't seen that

jfrog / kubenab

Only adding private repository to pod resource cause the deployment resource create pod failed from pod template. #17