mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.38k stars 529 forks source link

Issue with Mesosphere Chronos packages: setting /etc/chronos/conf/master in the presence of /etc/mesos/zk #481

Open ghost opened 9 years ago

ghost commented 9 years ago

I originally pinged Mesosphere about this, since I believed it was an issue with their packages and I couldn't find where the bash script portion of the sharchive actually lives (it doesn't appear to be in this repo). Mesosphere support asked me to create a ticket against this repo though.

I found some rather peculiar behavior using the Chronos package provided in Mesosphere's repos. If you set /etc/chronos/conf/master and/or /etc/chronos/conf/zk_hosts, and the file /etc/mesos/zk is present, --master and --zk_hosts will be passed to the executable twice:

Jun 27 00:50:56 localhost chronos[5435]: + exec java -Djava.library.path=/usr/local/lib:/usr/lib64:/usr/lib '-Djava.util.logging.SimpleFormatter.format=%2$s %5$s%6$s%n' -Xmx512m -cp /usr/bin/chronos org.apache.mesos.chronos.scheduler.Main --zk_hosts 192.168.248.10:2181 --master zk://192.168.248.10:2181/mesos --http_port 4400 --zk_hosts 192.168.248.10:2181 --master zk://192.168.248.10:2181/mesos
Jun 27 00:50:56 localhost chronos[5435]: [2015-06-27 00:50:56,320] INFO --------------------- (org.apache.mesos.chronos.scheduler.Main$:26)
Jun 27 00:50:56 localhost chronos[5435]: [2015-06-27 00:50:56,322] INFO Initializing chronos. (org.apache.mesos.chronos.scheduler.Main$:27)
Jun 27 00:50:56 localhost chronos[5435]: [2015-06-27 00:50:56,323] INFO --------------------- (org.apache.mesos.chronos.scheduler.Main$:28)
Jun 27 00:50:57 localhost chronos[5435]: [scallop] Error: Bad arguments for option 'master': 'zk://192.168.248.10:2181/mesos zk://192.168.248.10:2181/mesos' - you should provide exactly one argument for this option
Jun 27 00:50:57 localhost systemd: chronos.service: main process exited, code=exited, status=1/FAILURE
Jun 27 00:50:57 localhost systemd: Unit chronos.service entered failed state.

Narrowed the problem down to his function in the sharchive at /usr/bin/chronos

function load_options_and_log {
  set -x
  # Load Chronos options from Mesos and Chronos conf files that are present.
  # Launch main program with Syslog enabled.
  local cmd=( run_jar )
  if [[ -s /etc/mesos/zk ]]
  then
    cmd+=( --zk_hosts "$(cut -d / -f 3 /etc/mesos/zk)"
           --master "$(cat /etc/mesos/zk)" )
  fi
  if [[ -d $conf_dir ]]
  then
    while read -u 9 -r -d '' path
    do
      local name="${path#./}"
      if ! element_in "--${name#'?'}" "$@"
      then
        case "$name" in
          '?'*) cmd+=( "--${name#'?'}" ) ;;
          *)    cmd+=( "--$name" "$(< "$conf_dir/$name")" ) ;;
        esac
      fi
    done 9< <(cd "$conf_dir" && find . -type f -not -name '.*' -print0)
  fi
  logged chronos "${cmd[@]}" "$@"
}

I'd imagine users would expect some order of precedence here (e.g. /etc/chronos/conf/master is authoritative, and Chronos falls back to /etc/mesos/zk if it isn't present).

This was observed using the package chronos-2.3.4-1.0.81.el7.x86_64 from Mesosphere's repos on CentOS 7.0.1406.

Mongey commented 9 years ago

:+1:

il-katta commented 8 years ago

+1 same trouble with zk_hosts parameter on Debian with package version: 2.3.4-0.1.20150813102925.debian81