学习周报「2021」 - Githubissues

opskumu commented 3 years ago

历年学习周报

2021-01-11~17

Kubernetes is Removing Docker Support, Kubernetes is Not Removing Docker Support

architectural diagram of Kubernetes using Docker

architectural drawing with CRI-O supporting CRI-O natively without the Dockershim

Kubernetes is Removing Docker Support, Kubernetes is Not Removing Docker Support

Kubernetes Deployment CHANGE-CAUSE

CHANGE-CAUSE is copied from the Deployment annotation kubernetes.io/change-cause to its revisions upon creation. You can specify theCHANGE-CAUSE message by:

Annotating the Deployment with kubectl annotate deployment.v1.apps/nginx-deployment kubernetes.io/change-cause="image updated to 1.16.1"
Append the --record flag to save the kubectl command that is making changes to the resource.
Manually editing the manifest of the resource.

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#checking-rollout-history-of-a-deployment

Kubernetes Resource Update Mechanisms

文章描述了 update 和 patch 机制的不同以及使用场景，图示为多人 update 时的冲突流程

patch 类型
- json patch
- merge patch
- strategic merge patch 默认

https://www.alibabacloud.com/blog/understanding-openkruise-kubernetes-resource-update-mechanisms_596718 https://blog.atomist.com/kubernetes-apply-replace-patch/

opskumu commented 3 years ago

2021-01-18~24

Kubernetes Startup Probe 「Kubernetes v1.20 [stable]」

一般我们说 Pod 探针都是是 Readiness 和 Liveness Probe，而 Startup Probe 是 Kubernetes 1.16 版本支持的功能。简单看了下官方描述，我认为 Startup Probe 算是 Liveness 的补充，应用场景是针对那些配置了 Liveness 但是应用启动时间比较长，不好控制 Liveness 初始化时间。这个时候可以使用 Startup Probe 补充，Startup 探针只会在启动的时候执行，当 Startup 探测成功前其它探针处于禁用状态，只有当 Startup 探测成功才会执行其它的 Probe，如果 Startup 在设置探测次数达到依然失败，那么 Container 会被 kill，根据策略判断是否重新启动。或者另外一种场景，通过 Startup Probe 判断依赖的服务是否启动，启动再执行其它探针，这种是我个人歪歪，不确定是否可选：）

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-startup-probe

Kubernetes Secret 加密

默认 Kubernetes Secret 加密只是通过一层 base64，只要 --decode 选项即可查看原始信息，严格来说这是不安全的。kube-apiserver 可以通过 --encryption-provider-config 选项来加密写入到 etcd 存储的 secret 信息，这样可以针对存储密钥信息加密。当然有一说一，从 K8s 上 get 获取到的密钥信息经过解密之后 base64 的，--decode 即可解，这里说的是存储 etcd 端的数据加密。

https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/

Kubernetes API Aggregator

用于 Kubernetes 扩展，CRD 偏资源扩展，API Aggregator 则是偏 apiserver 扩展，如 metrics-server 就是利用这一特性的：

# kubectl get apiservices | grep metrics-server
v1beta1.metrics.k8s.io                    kube-system/metrics-server   True        15m

Unlike Custom Resource Definitions (CRDs), the Aggregation API involves another server - your Extension apiserver - in addition to the standard Kubernetes apiserver. The Kubernetes apiserver will need to communicate with your extension apiserver, and your extension apiserver will need to communicate with the Kubernetes apiserver. In order for this communication to be secured, the Kubernetes apiserver uses x509 certificates to authenticate itself to the extension apiserver.

如果要开启这项功能，kube-apiserver 需要添加如下选项：

--requestheader-client-ca-file=<path to aggregator CA cert>
--requestheader-allowed-names=front-proxy-client                  // 此处注意命名和 proxy-client-cert CN 名一致，可设置为空，表示所有 CA 签发的都允许
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=<path to aggregator proxy cert>
--proxy-client-key-file=<path to aggregator proxy key>

此处 CA 不要重用 K8s 的，使用独立的，具体原因见 https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/#ca-reusage-and-conflicts 。如果你的 master 没有运行 kube-proxy，需要额外添加选项：

--enable-aggregator-routing=true

https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/ https://feisky.gitbooks.io/kubernetes/content/plugins/aggregation.html https://blog.51cto.com/wzlinux/2474075

opskumu commented 3 years ago

2021-01-25~30

Kubernetes api resources

https://twitter.com/iximiuz/status/1353045442087571456

opskumu commented 3 years ago

2021-02-01~07

Capabilities of the Downward API

可以通过以下字段注入到容器环境变量：

Information available via fieldRef:
- metadata.name - the pod's name
- metadata.namespace - the pod's namespace
- metadata.uid - the pod's UID
- metadata.labels[''] - the value of the pod's label (for example, metadata.labels['mylabel'])
- metadata.annotations[''] - the value of the pod's annotation (for example, metadata.annotations['myannotation'])
Information available via resourceFieldRef:
- A Container's CPU limit
- A Container's CPU request
- A Container's memory limit
- A Container's memory request
- A Container's hugepages limit (providing that the DownwardAPIHugePages feature gate is enabled)
- A Container's hugepages request (providing that the DownwardAPIHugePages feature gate is enabled)
- A Container's ephemeral-storage limit
- A Container's ephemeral-storage request
In addition, the following information is available through downwardAPI volume fieldRef:
- metadata.labels - all of the pod's labels, formatted as label-key="escaped-label-value" with one label per line
- metadata.annotations - all of the pod's annotations, formatted as annotation-key="escaped-annotation-value" with one annotation per line
The following information is available through environment variables:
- status.podIP - the pod's IP address
- spec.serviceAccountName - the pod's service account name, available since v1.4.0-alpha.3
- spec.nodeName - the node's name, available since v1.4.0-alpha.3
- status.hostIP - the node's IP, available since v1.7.0-alpha.1

https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#capabilities-of-the-downward-api https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/

Configuration File from Pod Fields

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
    name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        volumeMounts:
          - name: pod-info
            mountPath: /pod
            readOnly: true
      volumes:
        - name: pod-info
          downwardAPI:
            items:
            - path: metadata/name
              fieldRef:
                fieldPath: metadata.name
            - path: metadata/namespace
              fieldRef:
                fieldPath: metadata.namespace
            - path: metadata/uid
              fieldRef:
                fieldPath: metadata.uid
            - path: metadata/labels
              fieldRef:
                fieldPath: metadata.labels
            - path: metadata/annotations
              fieldRef:
                fieldPath: metadata.annotations

Philippe Martin - Kubernetes_ Preparing for the CKA and CKAD Certifications-Apress (2021)

查看之前 container 的日志

kubectl logs -p -n <namespace> <podName>

--previous (-p for short) allows to view the logs of the previous containers, which is useful when a container crashed and you want to see the error that made it crash.

opskumu commented 3 years ago

2021-02-22~28

Cilium 相关文档

opskumu commented 3 years ago

2021-03-08~14

tolerationSeconds 变更

  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300

默认系统注入 Pod 的 tolerationSeconds 时间为 300s，可以通过 --default-unreachable-toleration-seconds=60 --default-not-ready-toleration-seconds=60 选项设置，这样可以减少节点异常 Pod 的恢复时间。

需要注意的一点是 If Taint Based Evictions are present in the pod definition, controller manager will not be able to evict the pod that tolerates the taint. Even if you don't define an eviction policy in your configuration, it gets a default one since Default Toleration Seconds admission controller plugin is enabled by default. 简单来说，Toleration Seconds 优先级要更高 https://stackoverflow.com/questions/53641252/kubernetes-recreate-pod-if-node-becomes-offline-timeout

这张图只考虑了 --pod-eviction-timeout，应该补充上 tolerationSeconds

opskumu commented 3 years ago

2021-03-15~21

K8s requests 资源统计

今天 Grafana 绘图的时候发现有些集群 CPU 利用为负的，查了下是因为正常统计 kube_pod_container_resource_requests_cpu_cores 的时候会算上那些非 Running 的，比如 evicted 的 pod 或者 Successed 状态的 pod，如 job 跑完就退出了，这个时候通过前面那个 metrics 把这些状态的加入的时候就会出现和实际不符合的情况。在统计的时候额外添加 pod 状态判断就可以修正这个问题。具体可以参见 https://github.com/kubernetes/kube-state-metrics/issues/458

sum(sum(sum(kube_pod_container_resource_requests_cpu_cores) by (namespace, pod, node) * on (pod) group_left()  (sum(kube_pod_status_phase{phase=~"Pending|Running"}) by (pod, namespace) == 1)) by (node))

prometheus recording rules

Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series. Querying the precomputed result will then often be much faster than executing the original expression every time it is needed. This is especially useful for dashboards, which need to query the same expression repeatedly every time they refresh.

官方针对 recording 的描述如上，简单说就是把消耗时长较长的计算直接由 Prometheus 预算好并存储为 metrics，这样可以大大加速如 Grafana 绘图等

https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

opskumu commented 3 years ago

2021-03-29~04-04

Kubernetes RBAC Referring to subjects

system:serviceaccount: (singular) is the prefix for service account usernames.

system:serviceaccounts: (plural) is the prefix for service account groups.

https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-subjects

opskumu commented 3 years ago

2021-04-05~11

`fs.inotify.max_user_instances`

Kubernetes 集群中 Prometheus operator 启动 prometheus 发现实例启动不了，最后确认是 fs.inotify.max_user_instances 配置过小原因导致的。

fs.inotify.max_user_instances This specifies an upper limit on the number of inotify instances that can be created per real user ID
fs.inotify.max_user_watches This specifies an upper limit on the number of watches that can be created per real user ID

通过以下命令可以查询当前 inotify instances 数

统计当前系统各用户 inotify instances 占用数：

find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -print 2>/dev/null | cut -d/ -f3 |xargs -I '{}' -- ps --no-headers -o '%U' -p '{}' | sort | uniq -c | sort -nr

统计当前系统 inotify instances 占用总数：

find /proc/*/fd/ -type l -lname "anon_inode:inotify" -printf "%hinfo/%f\n" | xargs grep -cE "^inotify" | column -t -s: | wc -l

https://k8s.imroc.io/troubleshooting/handle/runnig-out-of-inotify-watches/

https://docs.pingcap.com/zh/tidb-in-kubernetes/v1.0/prerequisites

https://unix.stackexchange.com/questions/498393/how-to-get-the-number-of-inotify-watches-in-use

https://stackoverflow.com/questions/13758877/how-do-i-find-out-what-inotify-watches-have-been-registered

Prometheus Operator endpoint to scrape autoconfiguration

https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/

opskumu commented 3 years ago

2021-04-12~18

ansible 并发控制

serial
throttle
...

https://docs.ansible.com/ansible/latest/user_guide/playbooks_strategies.html

opskumu commented 3 years ago

2021-04-19~25

关于 Finalizers 字段

Finalizers 字段属于 Kubernetes GC 垃圾收集器，是一种删除拦截机制，能够让控制器实现异步的删除前（Pre-delete）回调。其存在于任何一个资源对象的 Meta 中，在 k8s 源码中声明为 []string，该 Slice 的内容为需要执行的拦截器名称。

对带有 Finalizer 的对象的第一个删除请求会为其 metadata.deletionTimestamp 设置一个值，但不会真的删除对象。一旦此值被设置，finalizers 列表中的值就只能被移除。

当 metadata.deletionTimestamp 字段被设置时，负责监测该对象的各个控制器会通过轮询对该对象的更新请求来执行它们所要处理的所有 Finalizer。当所有 Finalizer 都被执行过，资源被删除。

https://developer.aliyun.com/article/772044

https://book.kubebuilder.io/reference/using-finalizers.html

https://zdyxry.github.io/2019/09/13/Kubernetes-%E5%AE%9E%E6%88%98-Operator-Finalizers/

docker container exit code

https://betterprogramming.pub/understanding-docker-container-exit-codes-5ee79a1d58f6

PLEG

https://developers.redhat.com/blog/2019/11/13/pod-lifecycle-event-generator-understanding-the-pleg-is-not-healthy-issue-in-kubernetes/

opskumu commented 3 years ago

2021-04-26~30

external-traffic-policy

https://bbs.huaweicloud.com/blogs/158642

https://www.asykim.com/blog/deep-dive-into-kubernetes-external-traffic-policies

opskumu commented 3 years ago

最近工作杂事太多，回头看居然已经停了一个多月了...

opskumu commented 3 years ago

2021-06-21~27

ServiceMonitor relabel

    relabelings:
    - sourceLabels: [__meta_kubernetes_pod_node_name]
      targetLabel: node

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: xxx
spec:
  endpoints:
  - port: http-metrics
    interval: 30s
    relabelings:
    - sourceLabels: [__meta_kubernetes_pod_node_name]
      targetLabel: node
  namespaceSelector:
    matchNames:
    - xxx
  selector:
    matchLabels:
      service: xxx

https://github.com/prometheus-operator/prometheus-operator/issues/1166

mtr 工具

当客户端访问目标服务器或负载均衡，使用ping命令测试出现丢包或不通时，可以通过MTR等工具进行链路测试来判断问题来源，比如判断环路等等

https://help.aliyun.com/document_detail/98706.html

opskumu commented 3 years ago

回一回头，啊，又 4 个月过去了，我都干了啥 🐙🐙🐙

opskumu commented 2 years ago

一年又要过去了，一地鸡毛~

lework commented 1 year ago

大佬，两年过去了。一直在等待。

opskumu commented 1 year ago

@lework 自从换工作之后就没有特别的时间更新了哦

long904 commented 1 year ago

确实挺期待更新的

opskumu / issues

学习周报「2021」 #29

历年学习周报

Kubernetes tips #10 早期

学习周报「2018」 #19

学习周报「2019」 #23

学习周报「2020」 #26

2021-01-11~17

Kubernetes is Removing Docker Support, Kubernetes is Not Removing Docker Support

Kubernetes Deployment CHANGE-CAUSE

Kubernetes Resource Update Mechanisms

2021-01-18~24

Kubernetes Startup Probe 「Kubernetes v1.20 [stable]」

Kubernetes Secret 加密

Kubernetes API Aggregator

2021-01-25~30

Kubernetes api resources

2021-02-01~07

Capabilities of the Downward API

Configuration File from Pod Fields

查看之前 container 的日志

2021-02-22~28

Cilium 相关文档

2021-03-08~14

tolerationSeconds 变更

2021-03-15~21

K8s requests 资源统计

prometheus recording rules

2021-03-29~04-04

Kubernetes RBAC Referring to subjects

2021-04-05~11

fs.inotify.max_user_instances

Prometheus Operator endpoint to scrape autoconfiguration

2021-04-12~18

ansible 并发控制

2021-04-19~25

关于 Finalizers 字段

docker container exit code

PLEG

2021-04-26~30

external-traffic-policy

2021-06-21~27

ServiceMonitor relabel

mtr 工具

`fs.inotify.max_user_instances`