juju-solutions / layer-flannel

0 stars 2 forks source link

run_bootstrap_daemons fails on kubernetes #3

Closed mbruzek closed 8 years ago

mbruzek commented 8 years ago

This may be related to issue #1, but I have a stack trace for this problem that might help determine a solution.

While deploying the observable-kubernetes bundle I encountered the following error. The last call in the python stack is reactive/flannel.py but it may be related to etcd interface.

unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40 Traceback (most recent call last):
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40   File "/var/lib/juju/agents/unit-kubernetes-2/charm/hooks/etcd-relation-joined", line 19, in <module>
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40     main()
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/__init__.py", line 73, in main
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40     bus.dispatch()
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 421, in dispatch
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40     _invoke(other_handlers)
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 404, in _invoke
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40     handler.invoke()
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40   File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 280, in invoke
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40     self._action(*args)
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40   File "/var/lib/juju/agents/unit-kubernetes-2/charm/reactive/flannel.py", line 38, in run_bootstrap_daemons
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40     check_call(split(cmd))
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40   File "/usr/lib/python3.4/subprocess.py", line 561, in check_call
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40     raise CalledProcessError(retcode, cmd)
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 INFO unit.kubernetes/2.etcd-relation-joined logger.go:40 subprocess.CalledProcessError: Command '['scripts/bootstrap_docker.sh', 'http://172.31.30.20:4001,http://172.31.4.220:4001,http://
172.31.24.147:4001']' returned non-zero exit status 4
unit-kubernetes-2[8965]: 2016-05-03 13:36:49 ERROR juju.worker.uniter.operation runhook.go:107 hook "etcd-relation-joined" failed: exit status 1

If this issue is a dupe, or does notn help determine the solution please close.

mbruzek commented 8 years ago

I ran debug-hooks on these failed units and it seems like issue #2.

Here is the text from the debug session:

# hooks/etcd-relation-joined 
++ config-get iface
+ interface=eth0
++ config-get cidr
+ cidr=10.1.0.0/16
+ connection_string=http://172.31.30.20:4001,http://172.31.24.147:4001,http://172.31.4.220:4001
+ '[' -f /var/run/docker-bootstrap.pid ']'
+ echo 'Docker bootstrap instance pid found. Doing nothing.'
Docker bootstrap instance pid found. Doing nothing.
+ exit 0
Traceback (most recent call last):
  File "hooks/etcd-relation-joined", line 19, in <module>
    main()
  File "/usr/local/lib/python3.4/dist-packages/charms/reactive/__init__.py", line 73, in main
    bus.dispatch()
  File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 421, in dispatch
    _invoke(other_handlers)
  File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 404, in _invoke
    handler.invoke()
  File "/usr/local/lib/python3.4/dist-packages/charms/reactive/bus.py", line 280, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-kubernetes-0/charm/reactive/flannel.py", line 39, in run_bootstrap_daemons
    ingest_network_config()
  File "/var/lib/juju/agents/unit-kubernetes-0/charm/reactive/flannel.py", line 54, in ingest_network_config
    with open('subnet.env') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'subnet.env'
lazypower commented 8 years ago

this looks like a dupe of #2 - i'll leave it open while investigating.

lazypower commented 8 years ago

This was fixed as of your latest code to return and set status that its waiting for the file...

However, there is additional code landing that make this entire block a bit more robust allowing the execution to be more idempotent than checking for the very first thing we did and if it finds evidence: skip on the entire configuration.

I think that moving configuration components into talking to layer-docker fits the encapsulation story better, and we can potentially start to factor out the communication through unit data here... and instead favor setting the docker bind opts directly.

Still more food for thought on this issue however as we're pending k8s tests using it.

lazypower commented 8 years ago

I'm going to close this as #6 was wrt this in specifics to working with the k8s layer(s)