Closed lyyao09 closed 5 years ago
This issue could be solved by setting the Pod Priority to system-cluster-critical
since kubenab
is relevant that the Cluster can work correctly.
I created a Pull-Request (#18) which targets your problem and solves.
I wasn't able to reproduce your problem with that patch applied.
Please note that this Patch (#18) does not implement Fail-Over/High-Availability – it only sets the Scheduling Priority to the Highest possible Value so the kubenab Pod would be scheduled above all else Pods! This is targeted by PR #12 (which is currently Work in Progress).
But you could also simply increase the Replica Count of the kubenab
deployment and then add a nodeSelector to prevent scheduling all the pods on the same Node.
@l0nax, Thank you for your prompt reply.
According to the method you provided and found that the problem still exists when abnormal power off node.
I think the root cause is that only the image name in the pod object(Kind: Pod) is modified and persisted(Not include deployment or statefuelset).
K8s version:
[root@node2 01-volume-emptyDir]# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
You are right.
The MutatingAdmissionController gets only called on CREATE
Operations (see https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/mutating-webhook.yaml#L16).
I will push a change so also UPDATE
Operations will be covered by the Admission Controller.
Thanks for your futture Tests @lyyao09
EDIT:
Adding UPDATE
Operation to the Admission Controller could also prevent Issues if a user (eg.) changes only the Image Version of an Pod.
@lyyao09 can you re-open the Issue please?
And your Issue would be fixed via Fail-Over (whitch targets the PR #12) this would prevent issues of a Node gets a abnormal power off.
Editing Deployments/ StatefSets could be done, but they will create Pods and you could also create Pods without a Deployment or StatefulSet.
So it would sub-optimal to change them. A pod is the "lowest" entity which will be created.
You are right. The MutatingAdmissionController gets only called on
CREATE
Operations (see https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/mutating-webhook.yaml#L16).I will push a change so also
UPDATE
Operations will be covered by the Admission Controller.Thanks for your futture Tests @lyyao09
EDIT: Adding
UPDATE
Operation to the Admission Controller could also prevent Issues if a user (eg.) changes only the Image Version of an Pod.
yes, we should add UPDATE
operation as you say. However, this can't solve my problem.
As said in https://github.com/jfrog/kubenab/issues/17#issuecomment-539941252 Fail-Over/ Replication is the only way to fix this problem directly.
And priority Classes are only to set the Scheduling Priority above all the other Pods. But kubernetes has a "delay" after which kubernetes marks a Nice as Unschedulable and NotRead
@l0nax, I simply increase the Replica Count of the kubenab deployment and then add a podAntiAffinity
to prevent scheduling all the pods on the same Node.
Repeated abnormal power off node test, it works well.
How does your podAffinity look like?
If you add the Node-Names hard-coded you prevent Kubernetes to dynamically schedule the Pods
as far as I know kubernetes doesn't have a feature where you can say "deploy max 2 pods per node".
You could implement this feature by building a Operator which schedules the Pods dynamically.
How does your podAffinity look like?
If you add the Node-Names hard-coded you prevent Kubernetes to dynamically schedule the Pods
I make kubenab scheduled to 3 master node, and each node only have one kubenab pod.
...
nodeSelector:
role: master
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- kubenab
topologyKey: kubernetes.io/hostname
...
@lyyao09 Sorry for the late answer. I didn't know about how to use podAntiAffinity
for schedule only per node.
I will make a Change to implement that Feature - after understanding how really it works.
Thanks!
@l0nax Thank you for your enthusiastic answer.
@l0nax regarding podAntiAffinity
it is nothing to implement it is already there:
https://github.com/jfrog/kubenab/blob/master/chart/kubenab/values.yaml#L132
https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/deployment.yaml#L57
:)
one of examples of adding it to override-values.yaml
file:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: helm.sh/chart
operator: In
values:
- kubenab
topologyKey: "kubernetes.io/hostname"
any of these labels can be used for podAntiAffinity
@rimusz oh sorry I haven't seen that
First, deploying kubenab in the kubernetes cluster can solve the problem that the image of the yaml file does not have a private repository. eg:
However, I found that there is a problem in the case of power off the node.
Suppose test-pod first scheduled on node1(kubenab is also scheduled on the node1). If node1 is powered off, test-pod and kubenab will migrate to other nodes after a while.
If the test-pod migrate first and kubenab is not ready, it will pull the image busybox:latest from default docker.io repository, and then the pod become ErrorImagePull(I have to use kubectl delete -f xxx.yaml cmd and kubectl create -f xxx.yaml after kubenab is ready to avoid this problem).
From kubenab/cmd/kubenab/admissions.go, kubenab only add repository to Pod resource, when pod created by deployment or statefulset's template, the above problem will occur because there is no private repository added to deployment or statefulset resource.
Please confirm if there is any problem or my usage is wrong?(use official example)