Open Reamer opened 6 years ago
@openshift/sig-master
Still present with 3.10
oc v3.10.0+0c4577e-1
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://s-cp-lb-01.cloud.example.de:443
openshift v3.10.0+7eee6f8-2
kubernetes v1.10.0+b81c8f8
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
+1 on this. We've disabled this alert on our setup because it's just flapping and not indicating any failures.
/remove-lifecycle stale
+1 on this , I also found it on etcd cluster master node , when add etcd3_alert.rules ..
it will cycle five mintue ... but we can't find something wrong with k8s ....
/remove-lifecycle stale
+1. I run etcd with debug log lever, and find this error:
etcdserver/api/v3rpc: failed to receive watch request from gRPC stream ("rpc error: code = Unavailable desc = stream error: stream ID 71; CANCEL")
errors about 1 time ~ in 5 minutes, stream ID - unique
etcd 3.2.24 / 3.2.25 / 3.3.10 Monitoring with prometheus (i getting this allert).
Any updates?
+1, ectd 3.3.10 with Prometheus Operator on Kubernetes 1.11.5
I have 5 nodes, but only one node having the alert, Others seem fine.
the etcd cluster runs well without issue.
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Still reproducible on Origin 3.11
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale Still present
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/lifecycle frozen /remove-lifecycle stale
/assign
Any news about this ?
At the moment I am using okd 4.7 and this bug is still present. Prometheus-Query:
grpc_server_handled_total{grpc_code="Unavailable",grpc_service="etcdserverpb.Watch"}
Hi, I noticed, that every grpc_code for grpc_method "Watch" is "Unavailable" in my okd cluster. My plan is to monitor etcd-instances with default prometheus alerts from the etcd-project. Maybe the watch-connection is not closed correctly and goes into an timeout.
Version
Steps To Reproduce
oc project openshift-etcd
oc rsh etcd-master1.mycompany.com
curl -s --cacert "/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt" --cert "/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master1.mycompany.com.crt" --key "/etc/kubernetes/static-pod-certs/secrets/etcd-all-peer/etcd-peer-master1.mycompany.com.key" https://localhost:2379/metrics
Current Result
Expected Result
Additional Information
If that behavior is already fixed or it's a false positive, let me know.