raintank / legacy-kubernetes-app

Grafana App for Kubernetes
Apache License 2.0
77 stars 12 forks source link

snap-kubestate gets recycled on a medium sized cluster #46

Open romankor opened 6 years ago

romankor commented 6 years ago

We have an issue that the kubestate pod gets recycled every couple of minutes and and cluster metrics are not being send on a cluster of roughly 30 machines and ~1000 pods.

This is what i see in the log file.

time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=df
time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=iostat
time="2017-10-16T23:20:12Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=load
time="2017-10-16T23:20:12Z" level=warning msg="Ignoring JSON/Yaml file: core.json" _block=start _module=control autodiscoverpath="/opt/snap/tasks_startup"
time="2017-10-16T23:20:14Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:14Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=1 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:24Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:24Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=2 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:34Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:34Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=3 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:44Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:44Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=4 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:54Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:54Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=5 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c

Can not figure a way to configure the max size of the message. Maybe you can shed some light on that ? Thanks

kubectl exec -it snap-kubestate-deployment-3536784749-k0q9s -- /opt/snap/bin/snaptel task list
ID                   NAME                        STATE       HIT     MISS    FAIL    CREATED         LAST FAILURE
6b6dacb3-8b53-458c-9cba-629ade4e7a65     Task-6b6dacb3-8b53-458c-9cba-629ade4e7a65   Running     6   0   6   4:57PM 10-17-2017   rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4283236 vs. 4194304)

We have in running on our dev/qa cluster which is much smaller , and it works there without any problem

daniellee commented 6 years ago

There is no way to change the limit unfortunately. We forked snap to get around this and just hacked in a higher limit.

The proper way to fix it would be to send a PR that fixes this issue: https://github.com/intelsdi-x/snap-plugin-lib-go/issues/43

DanCech commented 6 years ago

There is a PR https://github.com/intelsdi-x/snap-plugin-lib-go/pull/89

romankor commented 6 years ago

@daniellee Can you point me to the forked repository that you hacked ? Or is it private ?