Open panuhorsmalahti opened 7 years ago
As per https://github.com/weaveworks/weave/issues/2433#issuecomment-285726645 I ran the weave-zombie-hunter.sh again after redownloading the scripts from https://gist.github.com/SpComb/9cc2a7ed46ff708547d09854b78c197f#file-weave-zombie-hunter-sh and after I had executed the "behead" commands given from the output of the first run:
https://gist.github.com/panuhorsmalahti/20d76053c7bc370b97f18fe476f91a3e
I see some ERRO: 2017/03/10 09:24:47.188382 Captured frame from MAC (ba:ad:e6:6a:11:0f) to (02:68:2f:29:fb:27) associated with another peer ba:ad:e6:6a:11:0f(app-dev.itidm.domain.com)
from docker logs weave
, about 1600 times in total.
ip -d link show vethwe-bridge
10: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1410 qdisc noqueue master weave state UP mode DEFAULT qlen 1000
link/ether 76:b7:5a:73:fe:06 brd ff:ff:ff:ff:ff:ff promiscuity 1
veth
bridge_slave addrgenmode eui64
cat weave-log.txt | grep hairpin
ERRO: 2017/03/06 11:41:10.241698 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:10.453088 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:10.665129 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:11.090084 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:11.940130 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:13.640174 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:17.036124 Vetoed installation of hairpin flow FlowSpec{keys: [InPortFlowKey{vport: 1} EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:23.836105 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:37.420122 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/06 11:41:39.424102 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 1e:fa:7a:e2:79:29, dst: de:41:89:b6:e5:3e} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/10 13:43:52.773415 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: e2:7f:bd:e3:db:d3, dst: 8e:83:29:cd:26:e9} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
ERRO: 2017/03/10 13:43:53.710120 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: e2:7f:bd:e3:db:d3, dst: 8e:83:29:cd:26:e9} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
I can see no errors apart from Vetoed installation of hairpin
and Captured frame from MAC
.
Your weave-zombie-hunter.sh
output seems odd, the and the specifics of your symptoms are also different than what I had in https://github.com/weaveworks/weave/issues/2433#issuecomment-265993860.
You also have multiple MAC addresses responding to ARP queries for the same weave overlay IP address, which is similar. Howver, you have one set of MAC addresses that are present in the bridge fdb show
of multiple vethwepl*
interfaces, which is unexpected.
Can you post the full ip link
, weaveexec ps
and bridge fdb show br weave
output?
The way I read the script, if it prints id=8dfc61bc31d8 ip=10.81.128.55/16 then it should not subsequently print missing for the same veth. Yet most of your veths have both.
Not sure what's happening here either, it should not output both...
Well, I finally had to reboot the server and after reboot I didn't see any weave errors. I'll report back if the problem resurfaces.
There's some kind of #2433-ish confusion going on here in any case, because there's 42 master weave
veths but only 13 master docker0
veths... there should always be fewer weave veths than Docker veths, the remainder of those vethwepl*
interfaces are presumably leftovers from dead containers?
The way I read the script, if it prints id=8dfc61bc31d8 ip=10.81.128.55/16 then it should not subsequently print missing for the same veth. Yet most of your veths have both.
Not sure what's happening here either, it should not output both...
Right, that was just some Bash subprocess confusion...
Howver, you have one set of MAC addresses that are present in the bridge fdb show of multiple vethwepl* interfaces, which is unexpected.
It seems like bridge fdb show
ignores any unknown parameters: bridge fdb show br weave brportasdjfio vethwepl10181
just shows all bridge ports... :angry: Maybe your version of bridge
doesn't know about brport
? That would cause that kind of output..
Updated the gist with fixes for those issues...
I'm not sure how we can make progress on this unless someone is able to recreate it.
We'll leave it open for a week to see if more information becomes available.
I think someone else on Kontena Slack channel reported to also have this issue.
I just had this issue again.
Output of updated weave-zombie-hunter.sh by @SpComb:
[root]# ./weave-zombie-hunter.sh
# missing docker netns vethwepl10913 ifpeer=42
# vethwepl10913: unknown
# zombie veth vethwepl12545: pid=12545
# missing docker netns vethwepl12545 ifpeer=426
# vethwepl12545: zombie
ip link set vethwepl12545 nomaster
# found weaveps vethwepl16371 mac=62:62:1b:81:7b:f6: id=d2fed61d2cab ip=10.81.128.103/16
# missing docker netns vethwepl16371 ifpeer=64
# vethwepl16371: alive
# zombie veth vethwepl17581: pid=17581
# missing docker netns vethwepl17581 ifpeer=386
# vethwepl17581: zombie
ip link set vethwepl17581 nomaster
# missing docker netns vethwepl24590 ifpeer=390
# vethwepl24590: unknown
# missing docker netns vethwepl25256 ifpeer=394
# vethwepl25256: unknown
# missing docker netns vethwepl26808 ifpeer=398
# vethwepl26808: unknown
# missing docker netns vethwepl30575 ifpeer=430
# vethwepl30575: unknown
# zombie veth vethwepl3556: pid=3556
# missing docker netns vethwepl3556 ifpeer=402
# vethwepl3556: zombie
ip link set vethwepl3556 nomaster
# missing docker netns vethwepl4565 ifpeer=406
# vethwepl4565: unknown
# zombie veth vethwepl4858: pid=4858
# missing docker netns vethwepl4858 ifpeer=14
# vethwepl4858: zombie
ip link set vethwepl4858 nomaster
# found weaveps vethwepl5409 mac=ce:e2:2a:e9:f3:27: id=eba49edb9660 ip=10.81.128.22/16
# missing docker netns vethwepl5409 ifpeer=18
# vethwepl5409: alive
# zombie veth vethwepl5861: pid=5861
# missing docker netns vethwepl5861 ifpeer=22
# vethwepl5861: zombie
ip link set vethwepl5861 nomaster
# missing docker netns vethwepl630 ifpeer=438
# vethwepl630: unknown
# missing docker netns vethwepl6572 ifpeer=26
# vethwepl6572: unknown
# zombie veth vethwepl7529: pid=7529
# missing docker netns vethwepl7529 ifpeer=30
# vethwepl7529: zombie
ip link set vethwepl7529 nomaster
# zombie veth vethwepl8217: pid=8217
# missing docker netns vethwepl8217 ifpeer=34
# vethwepl8217: zombie
ip link set vethwepl8217 nomaster
# zombie veth vethwepl9663: pid=9663
# missing docker netns vethwepl9663 ifpeer=38
# vethwepl9663: zombie
ip link set vethwepl9663 nomaster
That seems to be saying missing docker netns
for every veth, so I suspect the docker-netns-addrs.sh
must be broken on your system... it was written and tested on CoreOS. What does it output if you run it directly?
The bridge fdb
diagnostics seem to be less broken now... seems like for both of the active MACs that it found, they were legit running containers.
It does still look like it's finding actual zombie veths. Unfortunately this diagnosis still doesn't say anything about why those Docker network namespaces are being left behind after being removed.
docker-netns-addrs.sh
output is and was empty.
I think someone else on Kontena Slack channel reported to also have this issue.
I've heard of three cases of this happening recently: once on the weave slack, and twice in the kontena slack. Both the weave slack case and your case were with CentOS/RHEL... I suspect there's something going on with the RHEL 3.10 kernel, and this doesn't happen with the CoreOS 4.x kernels.
docker-netns-addrs.sh
output is and was empty.
I guess the namespaces are somehow different in CentOS/RHEL than CoreOS...
Hi all, I think I'm the one who reported over the Kontena Slack channel.
I am using default Kontena images. So it happened on CoreOS too.
I've been assign (by my company) to attempt to replicate this issue. ETA 3 day. Let's hope I can get some more info.
Anything I should log down?
@mudspot would it be possible to get an access to your cluster? I'm "martynas" on http://weave-community.slack.com/.
@brb I'll let you have access to my dev cluster, when I have manage to replicate the error state.
When running with Kontena and this is happening, does a docker stop kontena-cadvisor
make the zombie veths go away, and allow curl
(TCP connections to the service) to start working again?
This is easy to reproduce when running a cAdvisor container using the default /rootfs
and /var/run
bind mounts in the default rprivate
mount propagation mode. Any Docker containers that are running when the cAdvisor container is started will become zombies once stopped: https://github.com/kontena/kontena/issues/2004.
When (re)starting the cadvisor container, it will pick up the netns mounts for any currently running Docker containers, and those will remain mounted in the cadvisor container's mount namespace after Docker unmounts them, keeping the netns alive even after Docker considers the container to be destroyed.
This isn't that noticable on CoreOS, since the libnetwork GC will unlink the lingering mountpoint within ~60s, and with the Linux 4.9 kernel, this causes the netns to also be unmoutned within the cadvisor container. This is much worse on the CentOS 7 Linux 3.10 kernel, where the netns will remain mounted indefinitely within the cadvisor container: https://github.com/kontena/kontena/issues/2004#issuecomment-288004665
@SpComb tried your suggestion.
It does not start working again.
Yes, it looks like there are two distinct cases here... the one described in https://github.com/kontena/kontena/issues/2004 is related to containers running when cadvisor starts, and it causes temporary zombies on CoreOS, and persistent zombies on CentOS.
Then there is some second issue that also affects CoreOS, and leads to persistent zombie netns's which I'm unable to find any references to. This is presumably some race condition, and may or may not be related to cadvisor.
Then there is some second issue that also affects CoreOS, and leads to persistent zombie netns's which I'm unable to find any references to. This is presumably some race condition, and may or may not be related to cadvisor.
There's some clue that the CoreOS / Linux 4.9 case might be related to the kontena/openvpn
container:
No theory as to how the OpenVPN server container triggers this issue though. It's a normal network-namespaced Docker container, but it has cap NET_ADMIN
, with a tun
device and iptables
NAT rules within the container.
Some intensive tracing work (see kernel-trace.sh
) reveals that it's not the openvpn
container... it's actually the docker-proxy
processes that are leaking the network namespaces, via some kind of netlink sockets: https://gist.github.com/SpComb/f83c53b05aad79c6213185a3da7a7902
A host has a number of zombie weave veths:
# vethwepl1541: pid=1541 ifpeer=108
# zombie pid=1541
# vethwepl1704: pid=1704 ifpeer=44
# zombie pid=1704
# vethwepl1907: pid=1907 ifpeer=50
# zombie pid=1907
# vethwepl1913: pid=1913 ifpeer=52
# zombie pid=1913
# vethwepl1966: pid=1966 ifpeer=62
# zombie pid=1966
# vethwepl2353: pid=2353 ifpeer=124
# zombie pid=2353
# vethwepl5101: pid=5101 ifpeer=128
# zombie pid=5101
Two docker-proxy
processes are running, mapping ports for a kontena/lb
service:
root 7054 0.0 0.0 51732 1464 ? Sl Mar20 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 443 -container-ip 172.17.0.4 -container-port 443
root 7063 0.0 0.0 35340 1328 ? Sl Mar20 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 80 -container-ip 172.17.0.4 -container-port 80
The docker container is stopped:
2017-03-23T16:41:11.248008092Z container kill 7ea1fef33d5e745d7e9c7e8b4c5dae3e0a9493518e6cf810a74f83b51fef5a13 (image=kontena/lb:latest, io.kontena.container.deploy_rev=2017-03-20 22:17:00 UTC, io.kontena.container.id=58d054ddec497a0006000393, io.kontena.container.name=kontena-cloud-internet-lb-1, io.kontena.container.overlay_cidr=10.81.128.16/16, io.kontena.container.overlay_network=kontena, io.kontena.container.pod=kontena-cloud-internet-lb-1, io.kontena.container.service_revision=4, io.kontena.container.type=container, io.kontena.grid.name=account, io.kontena.health_check.initial_delay=10, io.kontena.health_check.interval=60, io.kontena.health_check.port=80, io.kontena.health_check.protocol=http, io.kontena.health_check.timeout=10, io.kontena.health_check.uri=/__health, io.kontena.service.id=57ee4e29fb9e240007001437, io.kontena.service.instance_number=1, io.kontena.service.name=kontena-cloud-internet-lb, io.kontena.stack.name=null, name=kontena-cloud-internet-lb-1, signal=15)
2017-03-23T16:41:11.272829538Z container die 7ea1fef33d5e745d7e9c7e8b4c5dae3e0a9493518e6cf810a74f83b51fef5a13 (exitCode=0, image=kontena/lb:latest, io.kontena.container.deploy_rev=2017-03-20 22:17:00 UTC, io.kontena.container.id=58d054ddec497a0006000393, io.kontena.container.name=kontena-cloud-internet-lb-1, io.kontena.container.overlay_cidr=10.81.128.16/16, io.kontena.container.overlay_network=kontena, io.kontena.container.pod=kontena-cloud-internet-lb-1, io.kontena.container.service_revision=4, io.kontena.container.type=container, io.kontena.grid.name=account, io.kontena.health_check.initial_delay=10, io.kontena.health_check.interval=60, io.kontena.health_check.port=80, io.kontena.health_check.protocol=http, io.kontena.health_check.timeout=10, io.kontena.health_check.uri=/__health, io.kontena.service.id=57ee4e29fb9e240007001437, io.kontena.service.instance_number=1, io.kontena.service.name=kontena-cloud-internet-lb, io.kontena.stack.name=null, name=kontena-cloud-internet-lb-1)
2017-03-23T16:41:11.620252349Z container stop 7ea1fef33d5e745d7e9c7e8b4c5dae3e0a9493518e6cf810a74f83b51fef5a13 (image=kontena/lb:latest, io.kontena.container.deploy_rev=2017-03-20 22:17:00 UTC, io.kontena.container.id=58d054ddec497a0006000393, io.kontena.container.name=kontena-cloud-internet-lb-1, io.kontena.container.overlay_cidr=10.81.128.16/16, io.kontena.container.overlay_network=kontena, io.kontena.container.pod=kontena-cloud-internet-lb-1, io.kontena.container.service_revision=4, io.kontena.container.type=container, io.kontena.grid.name=account, io.kontena.health_check.initial_delay=10, io.kontena.health_check.interval=60, io.kontena.health_check.port=80, io.kontena.health_check.protocol=http, io.kontena.health_check.timeout=10, io.kontena.health_check.uri=/__health, io.kontena.service.id=57ee4e29fb9e240007001437, io.kontena.service.instance_number=1, io.kontena.service.name=kontena-cloud-internet-lb, io.kontena.stack.name=null, name=kontena-cloud-internet-lb-1)
All remaining docker-proxy
processes exit... and the zombie weave veths disappear
[LINK]Deleted 109: vethwepl1541@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
[LINK]Deleted 45: vethwepl1704@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
[LINK]Deleted 51: vethwepl1907@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
[LINK]Deleted 53: vethwepl1913@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
[LINK]Deleted 63: vethwepl1966@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
[LINK]Deleted 125: vethwepl2353@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
[LINK]Deleted 129: vethwepl5101@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
[LINK]Deleted 134: veth9905923@NONE: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
[LINK]Deleted 135: vethf8cbeef@NONE: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
[LINK]Deleted 137: vethwepl7082@NONE: <BROADCAST,MULTICAST> mtu 1410 qdisc noop state DOWN group default
Using some kernel kprobe_events
, it looks like one of the docker-proxy
processes is holding on to netns references via netlink sockets, which get released once the process exits:
docker-proxy-7109 [000] d... 76633492.785801: p_netlink_release_0: (netlink_release+0x0/0x340)
docker-proxy-7109 [000] d... 76633492.785863: p_netlink_release_0: (netlink_release+0x0/0x340)
docker-proxy-7109 [000] d... 76633492.785888: p_netlink_release_0: (netlink_release+0x0/0x340)
docker-proxy-7109 [000] d... 76633492.785991: p_netlink_release_0: (netlink_release+0x0/0x340)
docker-proxy-7109 [000] d... 76633492.786027: p_netlink_release_0: (netlink_release+0x0/0x340)
docker-proxy-7109 [000] d... 76633492.786046: p_netlink_release_0: (netlink_release+0x0/0x340)
docker-proxy-7109 [000] d... 76633492.786050: p_netlink_release_0: (netlink_release+0x0/0x340)
=> netlink_release
=> sock_close
=> __fput
=> ____fput
=> task_work_run
=> do_exit
=> do_group_exit
=> SyS_exit_group
=> do_syscall_64
=> return_from_SYSCALL_64
This immediately results in multiple network namespaces being released:
<idle>-0 [000] d.s. 76633492.789396: p___put_net_0: (__put_net+0x0/0x70)
ksoftirqd/0-3 [000] d.s. 76633492.789485: p___put_net_0: (__put_net+0x0/0x70)
ksoftirqd/0-3 [000] d.s. 76633492.789511: p___put_net_0: (__put_net+0x0/0x70)
ksoftirqd/0-3 [000] d.s. 76633492.789521: p___put_net_0: (__put_net+0x0/0x70)
ksoftirqd/0-3 [000] d.s. 76633492.789524: p___put_net_0: (__put_net+0x0/0x70)
ksoftirqd/0-3 [000] d.s. 76633492.789525: p___put_net_0: (__put_net+0x0/0x70)
ksoftirqd/0-3 [000] d.s. 76633492.789527: p___put_net_0: (__put_net+0x0/0x70)
<idle>-0 [000] d.s. 76633495.070476: p___put_net_0: (__put_net+0x0/0x70)
=> __put_net
=> sk_destruct
=> __sk_free
=> sk_free
=> deferred_put_nlk_sk
=> rcu_process_callbacks
=> ... various code paths
These result in cleanup_net
calls that destroy the zombie veths:
kworker/u30:1-5984 [000] d... 76633492.789433: p_cleanup_net_0: (cleanup_net+0x0/0x2b0)
kworker/u30:1-5984 [000] d... 76633492.789446: <stack trace>
=> cleanup_net
=> worker_thread
=> kthread
=> ret_from_fork
kworker/u30:1-5984 [000] d... 76633492.792408: p_unregister_netdevice_queue_0: (unregister_netdevice_queue+0x0/0xc0)
kworker/u30:1-5984 [000] d... 76633492.792423: <stack trace>
=> unregister_netdevice_queue
=> default_device_exit_batch
=> ops_exit_list.isra.4
=> cleanup_net
=> [unknown/kretprobe'd]
=> worker_thread
=> kthread
=> ret_from_fork
...
kworker/u30:1-5984 [000] d... 76633492.792443: p_netif_carrier_off_0: (netif_carrier_off+0x0/0x30)
kworker/u30:1-5984 [000] d... 76633492.792444: <stack trace>
=> netif_carrier_off
=> __dev_close_many
=> dev_close_many
=> rollback_registered_many
=> unregister_netdevice_many
=> default_device_exit_batch
=> ops_exit_list.isra.4
=> cleanup_net
=> [unknown/kretprobe'd]
=> worker_thread
=> kthread
=> ret_from_fork
It looks like an ARP timeout can also hold on to a netns reference:
<idle>-0 [000] d.s. 76633495.070476: p___put_net_0: (__put_net+0x0/0x70)
<idle>-0 [000] d.s. 76633495.070495: <stack trace>
=> __put_net
=> sk_destruct
=> __sk_free
=> sk_free
=> tcp_wfree
=> skb_release_head_state
=> skb_release_all
=> kfree_skb
=> arp_error_report
=> neigh_invalidate
=> neigh_timer_handler
=> call_timer_fn
=> run_timer_softirq
=> __do_softirq
=> irq_exit
=> xen_evtchn_do_upcall
=> xen_hvm_callback_vector
=> default_idle
=> arch_cpu_idle
=> default_idle_call
=> cpu_startup_entry
=> rest_init
=> start_kernel
=> x86_64_start_reservations
=> x86_64_start_kernel
Confirmed that the dockerd
process opens a netlink socket for each container network sandbox, and holds onto it until the container is stopped and the network sandbox is destroyed. When the same dockerd
process spawns a new docker-proxy
process for a different container, this netlink socket appears to get leaked to the docker-proxy
process. When the original container is stopped by Docker, this unrelated docker-proxy
process still holds a reference to the original container's network namespace via this leaked netlink socket, and the original container's leaked netns (and any associated interfaces) does not destroyed until this docker-proxy
process exits when this other container is stopped by Docker.
https://gist.github.com/SpComb/197cf7c4191dcca261fdfeade0be3c54
When starting a new Docker container, the main Docker process appears to execute the libnetwork:osl.GetSandboxForExternalKey
function, called via the sandbox_externalkey_unix
mechanism: [1]
2489 11:13:36 accept4(11, {sa_family=AF_LOCAL, NULL}, [2], SOCK_CLOEXEC|SOCK_NONBLOCK) = 52
2489 11:13:36 read(52, "{\"ContainerID\":\"bdf2e17d74bcc83a6e28785394ca9744afc214c7dab1ce33ac0401d7619df660\",\"Key\":\"/proc/5428/ns/net\"}", 1280) = 108
This does createNamespaceFile
: [2]
2489 11:13:36 stat("/var/run/docker/netns/c55b2525ccb9", 0xc820e40788) = -1 ENOENT (No such file or directory)
2489 11:13:36 openat(AT_FDCWD, "/var/run/docker/netns/c55b2525ccb9", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 61
2489 11:13:36 close(61) = 0
And then mountNetworkNamespace
: [3]
2489 11:13:36 mount("/proc/5428/ns/net", "/var/run/docker/netns/c55b2525ccb9", 0xc822434cd8, MS_BIND, NULL) = 0
And then netns.GetFromPath
: [4]
2489 11:13:36 openat(AT_FDCWD, "/var/run/docker/netns/c55b2525ccb9", O_RDONLY ) = 61
It then runs netlink.NewHandleAt -> nl.GetSocketAt
: [5]
2489 11:13:36 getpid() = 1493
2489 11:13:36 gettid() = 2489
2489 11:13:36 openat(AT_FDCWD, "/proc/1493/task/2489/ns/net", O_RDONLY) = 73
2489 11:13:36 setns(61, CLONE_NEWNET) = 0
2489 11:13:36 socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE) = 84
2489 11:13:36 bind(84, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
2489 11:13:36 setns(73, CLONE_NEWNET ) = 0
2489 11:13:36 close(73) = 0
This leaves the main Docker process with an fd=84
netlink socket open, associated with the container network namespace.
When a Docker container with published ports gets restarted, the main Docker process forks and execs a new docker-proxy
process: [6]:
1824 11:14:02 clone(child_stack=0, flags=SIGCHLD) = 5965
5965 11:14:02 execve("/usr/bin/docker-proxy", ["/usr/bin/docker-proxy", "-proto", "udp", "-host-ip", "0.0.0.0", "-host-port", "1194", "-container-ip", "172.17.0.1", "-container-port", "1194"], [/* 12 vars */]) = 0
Later, the dockerd
process's netlink socket is closed when the original container is stopped: [8]
3640 11:14:22 close(84) = 0
3640 11:14:22 umount("/var/run/docker/netns/c55b2525ccb9", MNT_DETACH) = 0
But the previously forked docker-proxy
process process has inherited this fd=84
netlink socket: [7]:
docker-pr 5965 root 84u sock 0,8 0t0 350413 protocol: NETLINK
The docker-proxy
process does not know anything about this netlink socket, but having inherited it from the parent dockerd
process, it is enough to keep the network namespace alive. See the previous comment for an analysis of what happens when the docker-proxy
process exits.
This issue is similar to #2433, and especially similar to https://github.com/weaveworks/weave/issues/2433#issuecomment-265993860
Created a new issue as per https://github.com/weaveworks/weave/issues/2433#issuecomment-285688668
I can ping and curl the container IP from the host, and can ping the container from another container, but curl doesn't work inside a container.
Version information: docker: Docker version 1.12.5, build 047e51b/1.12.5 kontena version: 1.1.2 Weave version: 1.8.2 Linux: Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild@x86-030.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Storage driver: devicemapper OS: Red Hat Enterprise Linux
weave-zombie-hunter.sh output: https://gist.github.com/panuhorsmalahti/302d520353fb23196fc8c179925ce501
arping output: