llhuii / dive-into-k8s

Apache License 2.0
0 stars 0 forks source link

node_lifecycle_controller源码分析 #2

Open llhuii opened 3 years ago

llhuii commented 3 years ago

节点控制官方文档见https://kubernetes.io/zh/docs/concepts/architecture/nodes/#node-controller code.k8s.io/pkg/controller/nodelifecycle/node_lifecycle_controller.go 主流程主要启动以下协程:

  1. 8个协程处理node资源的增加/更新/删除事件:每秒调用doNodeProcessingPassWorker
  2. 4个协程处理Pod资源的更新事件(pod.Spec.NodeName有变化):每秒调用doPodProcessingWorker
  3. 一个协程处理驱逐:每NodeEvictionPeriod(默认100ms)根据是否配置runTaintManager, 调用doNoExecuteTaintingPass或者doEvictionPass
  4. 一个协程处理kubelet上报节点的心跳事件:每nodeMonitorPeriod调用monitorNodeHealth
llhuii commented 3 years ago

doNodeProcessingPassWorker

功能:根据node状态/spec.Unschedulable对node增加或删除对应污点 从nodeUpdateQueue队列取nodeName(当有node Add或者Update事件入队 ):

  1. 调用doNoScheduleTaintingPass(nodeName), 其获取node对象进行tanit处理
    • 处理node.Status.Conditions, 满足以下map则增加对应污点标签, effect为NoSchedule:
      • Ready 为False => node.kubernetes.io/not-ready
      • Ready 为Unknown =》node.kubernetes.io/unreachable
      • Memory/Disk/Pid资源快不够用了 =》node.kubernetes.io/{memory,disk,pid}-pressure
      • 网络不可达 =》 node.kubernetes.io/network-unavailable
        // map {NodeConditionType: {ConditionStatus: TaintKey}}
        // represents which NodeConditionType under which ConditionStatus should be
        // tainted with which TaintKey
        // for certain NodeConditionType, there are multiple {ConditionStatus,TaintKey} pairs
        nodeConditionToTaintKeyStatusMap = map[v1.NodeConditionType]map[v1.ConditionStatus]string{
        v1.NodeReady: {
          v1.ConditionFalse:   v1.TaintNodeNotReady,
          v1.ConditionUnknown: v1.TaintNodeUnreachable,
        },
        v1.NodeMemoryPressure: {
          v1.ConditionTrue: v1.TaintNodeMemoryPressure,
        },
        v1.NodeDiskPressure: {
          v1.ConditionTrue: v1.TaintNodeDiskPressure,
        },
        v1.NodeNetworkUnavailable: {
          v1.ConditionTrue: v1.TaintNodeNetworkUnavailable,
        },
        v1.NodePIDPressure: {
          v1.ConditionTrue: v1.TaintNodePIDPressure,
        },
    • Spec.Unschedulable(比如kubectl cordon), 增加node.kubernetes.io/unschedulable的taint
    • 将以上要打的taints与现有taints对比, 返回需增加和需删除的, 调用api更新node.Spec.Taints(调用SwapNodeControllerTaint, 这个函数比想像中的要复杂多,没明白为啥), 以下有个api接口调用优化
      // First we try getting node from the API server cache, as it's cheaper. If it fails
      // we get it from etcd to be sure to have fresh data.
      if firstTry {
          node, err = kubeClient.CoreV1().Nodes().Get(context.TODO(), nodeName, metav1.GetOptions{ResourceVersion: "0"})
          firstTry = false
      } else {
          node, err = kubeClient.CoreV1().Nodes().Get(context.TODO(), nodeName, metav1.GetOptions{})
      }
  2. reconcileNodeLabels 处理beta版 beta.kubernetes.io/osbeta.kubernetes.io/arch label与kubernetes.io/arch等不一致的场景

那nodeUpdateQueue(fifo类型)是怎么入队的?

llhuii commented 3 years ago

doPodProcessingWorker

功能:将pod的状态置为not ready, 如果对应节点not ready的话。 从podUpdateQueue队列取出pod, 调用processPod:

  1. 获取pod nodeName, 从nodeHealthMap获取nodeHealth, 没有则return
  2. nodeLister.Get(nodeName) 并获取node的currentReadyCondition,没有则return
  3. 如果not nc.runTaintManager,则调用processNoTaintBaseEviction(TODO)
  4. 如果node的ready condition.status 不为ConditionTrue,则将该pod mark not ready

那podUpdateQueue(RateLimitingQueue类型)是什么时候入队的?

llhuii commented 3 years ago

当node 网络不可达时, 超过podEvictionTimeout(默认5分钟), 会将该node上所有pod驱逐, 具体动作是:

问题: 注意到Pod是被优雅删除了, 那Pod什么时候被真正删除?