Closed shenkonghui closed 1 year ago
qps 20左右 rate(rest_client_requests_total{method="GET",endpoint="test"}[1m])
{code="200", endpoint="test", host="10.96.0.1:443", instance="10.244.11.203:8080", job="middleware-controller", method="GET", namespace="middleware-operator", pod="middleware-controller-manager-77bd9d9bc9-9bvvl", service="middleware-controller"} | 20.911111111111108
延迟有0.5秒
histogram_quantile(0.5, rate(rest_client_request_latency_seconds_bucket{verb="GET"}[5m]))
{endpoint="test", instance="10.244.11.203:8080", job="middleware-controller", namespace="middleware-operator", pod="middleware-controller-manager-77bd9d9bc9-9bvvl", service="middleware-controller", url="https://10.96.0.1:443/%7Bprefix%7D", verb="GET"} | 0.512
但是奇怪别的controller却是正常
rate(rest_client_requests_total{method="GET",endpoint="test"}[1m]) >0
经过对比,发现高版本k8s对所有api group 进行了list-watch v1.26.3版本k8s
v.1.21等低版本k8s
controller-runtime有bug,升级到最近版本即可修复
问题现象:middleware资源是通过operator 根据每种中间件进行创建, 目前发现创建的非常慢。
服务器上
controller_runtime_active_workers{controller="middleware"} 为0代表当前协程没有在运行
并且workqueue_depth{name="mysqlcluster"} 为7,代表存在阻塞
本地环境debug
发现,虽然controller_runtime_active_workers不为0,但是workqueue_depth大于0