Closed opskumu closed 2 years ago
一般我们说 Pod 探针都是是 Readiness 和 Liveness Probe,而 Startup Probe 是 Kubernetes 1.16 版本支持的功能。简单看了下官方描述,我认为 Startup Probe 算是 Liveness 的补充,应用场景是针对那些配置了 Liveness 但是应用启动时间比较长,不好控制 Liveness 初始化时间。这个时候可以使用 Startup Probe 补充,Startup 探针只会在启动的时候执行,当 Startup 探测成功前其它探针处于禁用状态,只有当 Startup 探测成功才会执行其它的 Probe,如果 Startup 在设置探测次数达到依然失败,那么 Container 会被 kill,根据策略判断是否重新启动。或者另外一种场景,通过 Startup Probe 判断依赖的服务是否启动,启动再执行其它探针,这种是我个人歪歪,不确定是否可选 :)
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-startup-probe
默认 Kubernetes Secret 加密只是通过一层 base64,只要 --decode
选项即可查看原始信息,严格来说这是不安全的。kube-apiserver 可以通过 --encryption-provider-config
选项来加密写入到 etcd 存储的 secret 信息,这样可以针对存储密钥信息加密。当然有一说一,从 K8s 上 get 获取到的密钥信息经过解密之后 base64 的,--decode
即可解,这里说的是存储 etcd 端的数据加密。
https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
用于 Kubernetes 扩展,CRD 偏资源扩展,API Aggregator 则是偏 apiserver 扩展,如 metrics-server
就是利用这一特性的:
# kubectl get apiservices | grep metrics-server
v1beta1.metrics.k8s.io kube-system/metrics-server True 15m
Unlike Custom Resource Definitions (CRDs), the Aggregation API involves another server - your Extension apiserver - in addition to the standard Kubernetes apiserver. The Kubernetes apiserver will need to communicate with your extension apiserver, and your extension apiserver will need to communicate with the Kubernetes apiserver. In order for this communication to be secured, the Kubernetes apiserver uses x509 certificates to authenticate itself to the extension apiserver.
如果要开启这项功能,kube-apiserver 需要添加如下选项:
--requestheader-client-ca-file=<path to aggregator CA cert>
--requestheader-allowed-names=front-proxy-client // 此处注意命名和 proxy-client-cert CN 名一致,可设置为空,表示所有 CA 签发的都允许
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=<path to aggregator proxy cert>
--proxy-client-key-file=<path to aggregator proxy key>
此处 CA 不要重用 K8s 的,使用独立的,具体原因见 https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/#ca-reusage-and-conflicts 。如果你的 master 没有运行 kube-proxy,需要额外添加选项:
--enable-aggregator-routing=true
https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/ https://feisky.gitbooks.io/kubernetes/content/plugins/aggregation.html https://blog.51cto.com/wzlinux/2474075
可以通过以下字段注入到容器环境变量:
fieldRef
:
resourceFieldRef
:
https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/#capabilities-of-the-downward-api https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
volumeMounts:
- name: pod-info
mountPath: /pod
readOnly: true
volumes:
- name: pod-info
downwardAPI:
items:
- path: metadata/name
fieldRef:
fieldPath: metadata.name
- path: metadata/namespace
fieldRef:
fieldPath: metadata.namespace
- path: metadata/uid
fieldRef:
fieldPath: metadata.uid
- path: metadata/labels
fieldRef:
fieldPath: metadata.labels
- path: metadata/annotations
fieldRef:
fieldPath: metadata.annotations
Philippe Martin - Kubernetes_ Preparing for the CKA and CKAD Certifications-Apress (2021)
kubectl logs -p -n <namespace> <podName>
--previous
(-p for short) allows to view the logs of the previous containers, which is useful when a container crashed and you want to see the error that made it crash.
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
默认系统注入 Pod 的 tolerationSeconds 时间为 300s,可以通过 --default-unreachable-toleration-seconds=60 --default-not-ready-toleration-seconds=60
选项设置,这样可以减少节点异常 Pod 的恢复时间。
需要注意的一点是 If Taint Based Evictions are present in the pod definition, controller manager will not be able to evict the pod that tolerates the taint. Even if you don't define an eviction policy in your configuration, it gets a default one since Default Toleration Seconds admission controller plugin is enabled by default. 简单来说,Toleration Seconds 优先级要更高 https://stackoverflow.com/questions/53641252/kubernetes-recreate-pod-if-node-becomes-offline-timeout
这张图只考虑了 --pod-eviction-timeout
,应该补充上 tolerationSeconds
今天 Grafana 绘图的时候发现有些集群 CPU 利用为负的,查了下是因为正常统计 kube_pod_container_resource_requests_cpu_cores 的时候会算上那些非 Running 的,比如 evicted 的 pod 或者 Successed 状态的 pod,如 job 跑完就退出了,这个时候通过前面那个 metrics 把这些状态的加入的时候就会出现和实际不符合的情况。在统计的时候额外添加 pod 状态判断就可以修正这个问题。具体可以参见 https://github.com/kubernetes/kube-state-metrics/issues/458
sum(sum(sum(kube_pod_container_resource_requests_cpu_cores) by (namespace, pod, node) * on (pod) group_left() (sum(kube_pod_status_phase{phase=~"Pending|Running"}) by (pod, namespace) == 1)) by (node))
Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series. Querying the precomputed result will then often be much faster than executing the original expression every time it is needed. This is especially useful for dashboards, which need to query the same expression repeatedly every time they refresh.
官方针对 recording 的描述如上,简单说就是把消耗时长较长的计算直接由 Prometheus 预算好并存储为 metrics,这样可以大大加速如 Grafana 绘图等
https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
- system:serviceaccount: (singular) is the prefix for service account usernames.
- system:serviceaccounts: (plural) is the prefix for service account groups.
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#referring-to-subjects
fs.inotify.max_user_instances
Kubernetes 集群中 Prometheus operator 启动 prometheus 发现实例启动不了,最后确认是 fs.inotify.max_user_instances
配置过小原因导致的。
fs.inotify.max_user_instances
This specifies an upper limit on the number of inotify instances that can be created per real user IDfs.inotify.max_user_watches
This specifies an upper limit on the number of watches that can be created per real user ID通过以下命令可以查询当前 inotify instances 数
统计当前系统各用户 inotify instances 占用数:
find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -print 2>/dev/null | cut -d/ -f3 |xargs -I '{}' -- ps --no-headers -o '%U' -p '{}' | sort | uniq -c | sort -nr
统计当前系统 inotify instances 占用总数:
find /proc/*/fd/ -type l -lname "anon_inode:inotify" -printf "%hinfo/%f\n" | xargs grep -cE "^inotify" | column -t -s: | wc -l
- https://k8s.imroc.io/troubleshooting/handle/runnig-out-of-inotify-watches/
- https://docs.pingcap.com/zh/tidb-in-kubernetes/v1.0/prerequisites
- https://unix.stackexchange.com/questions/498393/how-to-get-the-number-of-inotify-watches-in-use
- https://stackoverflow.com/questions/13758877/how-do-i-find-out-what-inotify-watches-have-been-registered
https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/
https://docs.ansible.com/ansible/latest/user_guide/playbooks_strategies.html
Finalizers 字段属于 Kubernetes GC 垃圾收集器,是一种删除拦截机制,能够让控制器实现异步的删除前(Pre-delete)回调。其存在于任何一个资源对象的 Meta 中,在 k8s 源码中声明为 []string,该 Slice 的内容为需要执行的拦截器名称。
对带有 Finalizer 的对象的第一个删除请求会为其 metadata.deletionTimestamp 设置一个值,但不会真的删除对象。一旦此值被设置,finalizers 列表中的值就只能被移除。
当 metadata.deletionTimestamp 字段被设置时,负责监测该对象的各个控制器会通过轮询对该对象的更新请求来执行它们所要处理的所有 Finalizer。 当所有 Finalizer 都被执行过,资源被删除。
https://betterprogramming.pub/understanding-docker-container-exit-codes-5ee79a1d58f6
最近工作杂事太多,回头看居然已经停了一个多月了...
relabelings:
- sourceLabels: [__meta_kubernetes_pod_node_name]
targetLabel: node
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: xxx
spec:
endpoints:
- port: http-metrics
interval: 30s
relabelings:
- sourceLabels: [__meta_kubernetes_pod_node_name]
targetLabel: node
namespaceSelector:
matchNames:
- xxx
selector:
matchLabels:
service: xxx
https://github.com/prometheus-operator/prometheus-operator/issues/1166
当客户端访问目标服务器或负载均衡,使用ping命令测试出现丢包或不通时,可以通过MTR等工具进行链路测试来判断问题来源,比如判断环路等等
回一回头,啊,又 4 个月过去了,我都干了啥 🐙🐙🐙
一年又要过去了,一地鸡毛~
大佬,两年过去了。一直在等待。
@lework 自从换工作之后就没有特别的时间更新了哦
确实挺期待更新的
历年学习周报
Kubernetes tips #10 早期
学习周报「2018」 #19
学习周报「2019」 #23
学习周报「2020」 #26
2021-01-11~17
Kubernetes is Removing Docker Support, Kubernetes is Not Removing Docker Support
architectural diagram of Kubernetes using Docker
architectural drawing with CRI-O supporting CRI-O natively without the Dockershim
Kubernetes Deployment CHANGE-CAUSE
CHANGE-CAUSE is copied from the Deployment annotation kubernetes.io/change-cause to its revisions upon creation. You can specify theCHANGE-CAUSE message by:
Kubernetes Resource Update Mechanisms
文章描述了 update 和 patch 机制的不同以及使用场景,图示为多人 update 时的冲突流程