Open polym opened 5 years ago
根据前面提到的「手动 sysctl -p 可以把 nf_conntrack_max 设置上」,怀疑是开机启动的时候 sysctl 生效,但是被后面某些模块启动覆盖掉了,实在没办法,只能在 /etc/rc.local 强制写入 sysctl -p。重启。
重启后,发现还是没有设置成功,增加 sysctl -p > /tmp/sysctl.conf 2>&1
。重启后查看日志,果然有问题,
/tmp/sysctl.log
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_buckets: No such file or directory
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait: No such file or directory
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_fin_wait: No such file or directory
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait: No such file or directory
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established: No such file or directory
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_udp_timeout: No such file or directory
sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream: No such file or directory
messages 里信息更加丰富
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_buckets: No such file or directory
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait: No such file or directory
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_fin_wait: No such file or directory
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait: No such file or directory
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established: No such file or directory
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_udp_timeout: No such file or directory
messages:Feb 19 15:11:44 K8S-ZJ-FUD-59 rc.local: sysctl: cannot stat /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream: No such file or directory
messages:Feb 19 15:11:45 K8S-ZJ-FUD-59 kernel: nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
可以发现是,结束 /etc/rc.local 后,nf_conntrack 还没有被加载。
异常机器上的 message
# grep -3 conntrack /var/log/messages
Feb 19 17:21:41 K8S-ZJ-FUD-59 systemd: kdump.service failed.
Feb 19 17:21:41 K8S-ZJ-FUD-59 dockerd: time="2019-02-19T17:21:41.717474971+08:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Feb 19 17:21:41 K8S-ZJ-FUD-59 dockerd: time="2019-02-19T17:21:41.718420537+08:00" level=info msg="Loading containers: start."
Feb 19 17:21:41 K8S-ZJ-FUD-59 kernel: nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
Feb 19 17:21:41 K8S-ZJ-FUD-59 dockerd: time="2019-02-19T17:21:41.752767154+08:00" level=info msg="Firewalld running: false"
Feb 19 17:21:41 K8S-ZJ-FUD-59 kernel: IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Feb 19 17:21:41 K8S-ZJ-FUD-59 dockerd: time="2019-02-19T17:21:41.859042685+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
正常机器上的 message
# grep -3 conntrack /var/log/messages-*
/var/log/messages-20190215-Feb 14 16:37:32 DCK-ZJ-FUD-134 kernel: [ 8.771257] XFS (sde1): nobarrier option is deprecated, ignoring.
/var/log/messages-20190215-Feb 14 16:37:32 DCK-ZJ-FUD-134 kernel: [ 8.772489] XFS (sde1): Mounting V4 Filesystem
/var/log/messages-20190215-Feb 14 16:37:32 DCK-ZJ-FUD-134 kernel: [ 8.793512] XFS (sde1): Ending clean mount
/var/log/messages-20190215:Feb 14 16:37:33 DCK-ZJ-FUD-134 kernel: [ 9.925572] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
/var/log/messages-20190215-Feb 14 16:37:33 DCK-ZJ-FUD-134 kernel: [ 10.028952] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
/var/log/messages-20190215-Feb 14 16:37:33 DCK-ZJ-FUD-134 kernel: [ 10.141497] overlayfs: upper fs needs to support d_type.
/var/log/messages-20190215-Feb 14 16:37:35 DCK-ZJ-FUD-134 kernel: [ 11.605013] overlayfs: upper fs needs to support d_type.
正常机器 nf_conntrack 模块是由启动时自动启动,而异常机器是由 dockerd 调用内核接口加载 nf_conntrack 模块的。
把正常机器上的 dockerd/kubelet disable 后重启。发现 conntrack 模块也没有被加载。enable 后重启,也出现了相同问题,nf_conntrack 参数设置失败。
所以,之前的结论有误,其实都是由 dockerd 来加载模块的。
增加 /etc/modules-load.d/nf_conntrack.conf,内容如下:
nf_conntrack
nf_conntrack_ipv4
设置好之后,systemd-sysctl.service 的依赖 systemd-modules-load.service 会自动加载这个配置中的模块。
具体依赖关系可以查看以下两个配置文件
背景
把 Centos7 内核从 4.10.1 更新到 4.20 后,发现 kubelet 无法正常部署 Pod,提示错误
查了下实在找不到原因,重启机器,发现 nf_conntrack_max 跟 sysctl.conf 设置里的不一样了(因为最近出现过几次,所以比较敏感),并且执行 sysctl -p 可以设置成功。之前线上也出现过类似情况,第一直觉跟 conntrack 相关的模块没加载有关。于是做了下比对: