juju-solutions / layer-flannel

0 stars 2 forks source link

flannel run_bootstrap_daemons runs too often #8

Closed mbruzek closed 8 years ago

mbruzek commented 8 years ago

While working on a layer that consumes flannel (kubernetes), I notice that the run_bootstrap_daemons method is running each time for each kubernetes unit when the reactive framework re-evaluates states.

unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.juju-log server.go:268 Invoking reactive handler: reactive/flannel.py:35:run_bootstrap_daemons
unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.update-status logger.go:40 ++ config-get iface
unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.update-status logger.go:40 + interface=eth0
unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.update-status logger.go:40 ++ config-get cidr
unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.update-status logger.go:40 + cidr=10.1.0.0/16
unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.update-status logger.go:40 + connection_string=http://172.31.12.82:4001
unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.update-status logger.go:40 + '[' '!' -f /var/run/docker-bootstrap.pid ']'
unit-kubernetes-0[5332]: 2016-05-20 15:13:09 INFO unit.kubernetes/0.update-status logger.go:40 + sleep 1
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 + docker -H unix:///var/run/docker-bootstrap.sock run --net=host --rm gcr.io/google_containers/etcd:2.0.12 etcdctl -C http://172.31.12.82
:4001 set /coreos.com/network/config '{ "Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}'
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 { "Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 ++ docker -H unix:///var/run/docker-bootstrap.sock ps -f name=flannel -q
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 + flannelCID=d7ecd74286d8
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 + [[ d7ecd74286d8 == '' ]]
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 + docker -H unix:///var/run/docker-bootstrap.sock cp d7ecd74286d8:/run/flannel/subnet.env .
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 + source subnet.env
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 ++ FLANNEL_NETWORK=10.1.0.0/16
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 ++ FLANNEL_SUBNET=10.1.10.1/24
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 ++ FLANNEL_MTU=8951
unit-kubernetes-0[5332]: 2016-05-20 15:13:10 INFO unit.kubernetes/0.update-status logger.go:40 ++ FLANNEL_IPMASQ=false

To be clear the method short circuits preventing any problems, but the shell code is executed over and over again causing log spam and this is not the intended result.

The docstring on the method indicates this method "Not to be run after initial job completion".

I think we should update the method to use @when_not('flannel.sdn.available') and set that state after the first time it is run. I am also happy to do this work.

lazypower commented 8 years ago

Theres only one problem with that. the bootstrap daemon is not under init's control, which is why its been left in this state so when the juju agent comes back online it fires back up the bootstrap daemon.

I know thats a really terrible reason for why this is this way...but its the only reason i have. With a proper upstart/systemd job this would be a non-issue.

lazypower commented 8 years ago

This is fixed with #9