Closed jinnzy closed 4 years ago
Here is the relevant information found
https://trzepak.pl/viewtopic.php?t=63030 https://www.mail-archive.com/bird-users@network.cz/msg04492.html
Is your routing table particularly large? What do you get for ip route
and ip addr
?
Is your routing table particularly large? What do you get for
ip route
andip addr
?# ip addr |wc -l 1074 # ip route |wc -l 122
Is this too large routing table? I modified this parameter and the cpu usage has now returned to normal
scan time 10; # Scan kernel routing table every 2 seconds
Thank you for your reply.
Are you running kube-proxy in IPVS mode; I believe it assigns every service VIP locally.
We have a similar issue (we observe this: https://github.com/projectcalico/calico/issues/2992.) 3 masters on kubespray cluster. (kubespray 2.11.0 on ubuntu 16.04) Cluster is running fine for a number of weeks. Then starts failing. Note that this is our CI cluster so a lot of 'helm install' and 'helm delete' commands.
$ ip addr |wc -l
14376
That looks like a very big number to me.
anything to check further?
Currently we just reboot masters one by one. Is there a cleaner fix/workarround?
@aligthart are you in kube-proxy IPVS mode? It adds every service IP to a local dummy device (due to a requirement of the IPVS stack).
@fasaxc yes we are. Below the config map for kube-proxy
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: xxxxxxxxxxx
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride:xxxxxxxxx
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: []
minSyncPeriod: 0s
scheduler: rr
strictARP: false
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: ipvs
nodePortAddresses: []
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
should we switch to ip-tables? Anything else wrong in our proxy setup?
Switching to iptables mode should help with BIRD CPU, but presumably you're using IPVS mode for a reason (high numbers of services?)
@fasaxc Thanks for the quick replies.Our cluster is relatively small so we do not benefit from the performance gains of ipvs yet. We will try with iptables and monitor the situation again. Note that after a reboot of our masters the 'ip addr' linecount was about 100.
@jinnzy are you using IPVS? We fixed an issue related to it in https://github.com/projectcalico/confd/pull/314
If so I think this issue can be closed
@jinnzy are you using IPVS? We fixed an issue related to it in projectcalico/confd#314
If so I think this issue can be closed
Yes thank you very much.
We meet the same issue today. We don't have too many nodes/routes. We have two pods that have the CPU/heathy issue. I fixed it by restarting the service.
@wd thsi is a very old issue, if you can reproduce on up-to-date calico please open a new issue.
Hello, our production environment encountered this problem, if there is any way to solve it would be greatly appreciated.
bird cpu single core usage is always 100% Use perf top to find that if_find_by_name function is higher using cpu
Expected Behavior
cpu usage reduced to within normal range
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
1. 2. 3. 4.
Context
Your Environment
bird version: BIRD version v0.3.3+birdv1.6.3
Operating System and version: CentOS Linux release 7.6.1810 (Core) Linux vlnx010003.foneshare.cn 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Link to your project (optional):