Open zzzzzzyzz opened 1 year ago
就是单机的集群配置都配好了之后,过一段时间就会在rancher中的许多服务里显示 Deployment does not have minimum availability. 然后看日志的话就是Readiness probe failed Liveness probe failed这样的问题
Readiness probe failed: Get http://localhost:9099/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)类似这样的。请问这种情况是资源不够吗,还是别的原因呢?
我想问一下,就是在集群中加入了GPU之后,刚开始没报错,过了几个小时之后就是 cattle-cluster-agent canal coredns
kubeflow-prometheus-adapter 这些服务不停的重启更新,我看了一下大概是这三种,这种问题该怎么解决呢?
Readiness probe failed: Get http://10.42.0.14:9090/-/ready: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Liveness probe failed: Get http://10.42.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 210.28.18.30 210.28.16.26 210.28.18.26