mesosphere / chronos-pkg

Apache License 2.0
5 stars 16 forks source link

/etc/chronos/conf misconfiguration #2

Open elee opened 9 years ago

elee commented 9 years ago

Trying to install the chronos deb out of the mesosphere repo I encountered some flapping with the service. Peeking into syslog I saw these errors:

Oct 21 23:00:08 ip-10-170-79-136 chronos[6162]: [2014-10-21 23:00:08,467] INFO --------------------- (com.airbnb.scheduler.Main$:36)
Oct 21 23:00:08 ip-10-170-79-136 chronos[6162]: [2014-10-21 23:00:08,472] INFO Initializing chronos. (com.airbnb.scheduler.Main$:37)
Oct 21 23:00:08 ip-10-170-79-136 chronos[6162]: [2014-10-21 23:00:08,477] INFO --------------------- (com.airbnb.scheduler.Main$:38)
Oct 21 23:00:08 ip-10-170-79-136 mesos-master[19762]: 2014-10-21 23:00:08,501:19762(0x7f6041fbf700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 18ms
Oct 21 23:00:08 ip-10-170-79-136 marathon[14268]: 2014-10-21 23:00:08,646:14268(0x7f02864ea700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 18ms
Oct 21 23:00:08 ip-10-170-79-136 mesos-master[19762]: 2014-10-21 23:00:08,646:19762(0x7f6042fc1700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 18ms
Oct 21 23:00:12 ip-10-170-79-136 chronos[6162]: [scallop] Error: Unknown option 'http_port.dpkg-dist'
Oct 21 23:00:12 ip-10-170-79-136 kernel: [36467832.732587] init: chronos main process (6162) terminated with status 1
Oct 21 23:00:12 ip-10-170-79-136 kernel: [36467832.732642] init: chronos main process ended, respawning
Oct 21 23:00:12 ip-10-170-79-136 chronos[6185]: + run_jar --zk_hosts mmaster-0:2181,mmaster-1:2181,mmaster-2:2181 --master zk://host1:2181,host2:2181,host3:2181/mesos --http_port 4400 --http_port.dpkg-dist 4400

http_port.dpkg-dist seems to be the offending argument. I thought this was some error in how I was installing the deb via automation, but the ctime on this file is Oct-2.

Curiously, peeking inside the deb itself:

$:/tmp/chronos-deb$ dpkg -c chronos_2.2.0-0.1.201406132137_amd64.deb
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./usr/
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./usr/local/
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./usr/local/bin/
-rwxr-xr-x 0/0        34761920 2014-06-13 21:37 ./usr/local/bin/chronos
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./etc/
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./etc/init/
-rw-r--r-- 0/0             150 2014-06-13 21:37 ./etc/init/chronos.conf
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./etc/chronos/
drwxrwxr-x 0/0               0 2014-06-13 21:37 ./etc/chronos/conf/
-rw-rw-r-- 0/0               5 2014-06-13 21:37 ./etc/chronos/conf/http_port
$

I can workaround by whacking the /etc/chronos/conf/http_port.dpkg-dist file manually, but I am suspicious of this part of the Makefile: https://github.com/mesosphere/chronos-pkg/blob/f22c398076f3f27f67703b15fd2af9c3b27f8200/Makefile#L85

The contents of the offending file, are for the record, correct:

# cd /etc/chronos/conf ; cat http_port.dpkg-dist
4400
#
elee commented 9 years ago

host details:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04 LTS
Release:    12.04
Codename:   precise
$ uname -a
Linux ip-1-2-3-4 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$
lingmann commented 9 years ago

Thanks for the bug report @elee This is actually an issue with the wrapper script. System packaging tools often leave backup copies of configuration files next to the original file. Unfortunately, the wrapper script assumes that any file inside /etc/chronos/conf/ should be treated as a flag to the chronos service. So when backup files are present, the wrapper script will try to use these files as additional arguments. The intermediate workaround would be to delete the .dpkg-dist files and restart the chronos service.

sudo find /etc/chronos/conf -name \*.dpkg-dist -delete
sudo service chronos restart
elee commented 9 years ago

@lingmann thanks for the fast reply. I ended up just instructing the automation to whack that file if it exists and all is well.

What step in packaging creates this file? I am unsure from reading the Makefile. Could you possibly just wipe it out in the postinst hook?

lingmann commented 9 years ago

It's actually not created by the package at all, it is created by the system packaging tools (dpkg) during a Chronos package upgrade. So in your case, it looks like the file /etc/chronos/conf/http_port existed and had contents which differed from the new Chronos package that was installed. We could remove these files with the postinst hook, but I'm leaning towards updating the wrapper script so that these backup files are explicitly ignored.