juju-solutions / bigtop

Mirror of Apache Bigtop
Apache License 2.0
0 stars 2 forks source link

Kafka error when relating to zookeeper #56

Closed xannz closed 6 years ago

xannz commented 6 years ago

Kafka goes into an error state after relating to zookeeper:

Executed commands:

juju deploy cs:kafka-40
juju deploy cs:zookeeper-42
juju add-relation kafka zookeeper
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed Traceback (most recent call last):
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/var/lib/juju/agents/unit-kafka-test-0/charm/hooks/zookeeper-relation-changed", line 19, in <module>
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     main()
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/usr/local/lib/python3.5/dist-packages/charms/reactive/__init__.py", line 113, in main
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     bus.dispatch(restricted=restricted_mode)
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 364, in dispatch
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     _invoke(other_handlers)
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 340, in _invoke
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     handler.invoke()
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 162, in invoke
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     self._action(*args)
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/var/lib/juju/agents/unit-kafka-test-0/charm/reactive/kafka.py", line 43, in configure_kafka
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     kafka.configure_kafka(zks)
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "lib/charms/layer/bigtop_kafka.py", line 64, in configure_kafka
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     bigtop.trigger_puppet()
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "lib/charms/layer/apache_bigtop_base.py", line 721, in trigger_puppet
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     java_home()),
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/usr/local/lib/python3.5/dist-packages/jujubigdata/utils.py", line 195, in re_edit_in_place
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     with Path(filename).in_place(encoding=encoding) as (reader, writer):
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     return next(self.gen)
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed   File "/usr/local/lib/python3.5/dist-packages/path.py", line 1452, in in_place
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed     os.rename(self, backup_fn)
unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed FileNotFoundError: [Errno 2] No such file or directory: Path('/etc/default/bigtop-utils') -> Path('/etc/default/bigtop-utils.bak')
unit-kafka-test-0: 15:14:02 ERROR juju.worker.uniter.operation hook "zookeeper-relation-changed" failed: exit status 1
kwmonroe commented 6 years ago

This is strange.. I don't think the problem is with the zk relation -- it just manifests itself there because that's the first hook that triggers a puppet apply. The actual problem is that /etc/default/bigtop-utils doesn't exist:

unit-kafka-test-0: 15:14:02 DEBUG unit.kafka-test/0.zookeeper-relation-changed FileNotFoundError: [Errno 2] No such file or directory: Path('/etc/default/bigtop-utils') -> Path('/etc/default/bigtop-utils.bak')

So there's something wrong with the installation of the bigtop-utils package. This may be related to the fact that we set the apt repo to the bigtop CI system for kafka due to a redistribution issue of the kafka packages.

The strange part here is that our own CI cleared kafka-zookeeper a couple days ago:

http://bigtop.charm.qa/cwr_bundle_hadoop_kafka/48/report.html

I'll start digging into what's going on here.

kwmonroe commented 6 years ago

Ah, i see what's up. Build 429 of the Bigtop-trunk-repos failed this morning:

https://ci.bigtop.apache.org/job/Bigtop-trunk-repos/

Due to our workaround to use the CI system as our apt repo, we're trying to install kafka from a partial repository. I'll try to find a more stable repository (or host the binary artifact ourselves).

kwmonroe commented 6 years ago

The updated kafka charm (-41) in the edge channel should fix this:

https://jujucharms.com/kafka/41

This still uses Bigtop's CI apt repositories, but now it uses Bigtop-1.2.1 instead of Bigtop-trunk. The latter was never a good idea given the propensity for trunk to break.

Our own CI is still running. If successful, I'll move this through to stable and fix upstream with the following:

https://issues.apache.org/jira/browse/BIGTOP-3013

kwmonroe commented 6 years ago

@xannz, danger! kafka-41 is no good. A recent change to layer-basic put charm deps in a python venv. Bigtop charm actions were not activating this venv and therefore failed:

http://bigtop.charm.qa/cwr_bundle_hadoop_kafka/49/report.html

While kafka-41 would deploy and be somewhat functional, any charm actions would most certainly fail. A fix is now in place, and has been incorporated into the edge channel release as kafka-43:

https://jujucharms.com/kafka/43

Tests for this look good:

http://bigtop.charm.qa/cwr_bundle_hadoop_kafka/52/report.html

However, the fix required significant changes to bigtop charm actions. It'll take another day to complete CI for all the affected charms/bundles.

Fwiw, this was (as it always seems to be) unfortunate timing, especially for a friday -- what you initially reported was a real problem with kafka and the bigtop repos; it just happened to crop up at the same time as layer-basic changes affected bigtop charm actions.

xannz commented 6 years ago

@kwmonroe Indeed unfortunate timing, thanks for the quick updates though.

dixanpena commented 6 years ago

I faced a similar situation when deploying Kafka and adding a relation with Zookeeper (cs:zookeeper-42).

kwmonroe commented 6 years ago

Hey folks, I'm not sure on the exact revision where this was fixed, but it's certainly fixed in the latest stable kafka and zookeeper charms (keep in mind that zk was never really the issue -- it was just that the zk relation exposed the fact that kafka wasn't getting installed correctly):

https://jujucharms.com/kafka/ (rev >= 51) https://jujucharms.com/zookeeper/ (rev >= 53)

Closing this out.