mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 529 forks source link

Segfault when launching master #485

Closed elouanKeryell-Even closed 9 years ago

elouanKeryell-Even commented 9 years ago

Environment

OS: CentOS 7 chronos-2.3.4-1.0.81.el7.x86_64.rpm mesos-0.22.1-1.0.centos701406.x86_64.rpm mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_64.rpm

Bug

When starting chronos:

$ systemctl start chronos

it crashes. Here are the logs:

Jul  6 19:33:17 master-1 systemd: Stopping Chronos...
Jul  6 19:33:17 master-1 systemd: Starting Chronos...
Jul  6 19:33:17 master-1 systemd: Started Chronos.
Jul  6 19:33:17 master-1 chronos: + cmd=(run_jar)
Jul  6 19:33:17 master-1 chronos: + local cmd
Jul  6 19:33:17 master-1 chronos: + [[ -s /etc/mesos/zk ]]
Jul  6 19:33:17 master-1 chronos: + cmd+=(--zk_hosts "$(cut -d / -f 3 /etc/mesos/zk)" --master "$(cat /etc/mesos/zk)")
Jul  6 19:33:17 master-1 chronos: ++ cut -d / -f 3 /etc/mesos/zk
Jul  6 19:33:17 master-1 chronos: ++ cat /etc/mesos/zk
Jul  6 19:33:17 master-1 chronos: + [[ -d /etc/chronos/conf ]]
Jul  6 19:33:17 master-1 chronos: + read -u 9 -r -d '' path
Jul  6 19:33:17 master-1 chronos: ++ cd /etc/chronos/conf
Jul  6 19:33:17 master-1 chronos: ++ find . -type f -not -name '.*' -print0
Jul  6 19:33:17 master-1 chronos: + local name=zk_path
Jul  6 19:33:17 master-1 chronos: + element_in --zk_path
Jul  6 19:33:17 master-1 chronos: + local e
Jul  6 19:33:17 master-1 chronos: + return 1
Jul  6 19:33:17 master-1 chronos: + case "$name" in
Jul  6 19:33:17 master-1 chronos: + cmd+=("--$name" "$(< "$conf_dir/$name")")
Jul  6 19:33:17 master-1 chronos: + read -u 9 -r -d '' path
Jul  6 19:33:17 master-1 chronos: + local name=hostname
Jul  6 19:33:17 master-1 chronos: + element_in --hostname
Jul  6 19:33:17 master-1 chronos: + local e
Jul  6 19:33:17 master-1 chronos: + return 1
Jul  6 19:33:17 master-1 chronos: + case "$name" in
Jul  6 19:33:17 master-1 chronos: + cmd+=("--$name" "$(< "$conf_dir/$name")")
Jul  6 19:33:17 master-1 chronos: + read -u 9 -r -d '' path
Jul  6 19:33:17 master-1 chronos: + local name=http_port
Jul  6 19:33:17 master-1 chronos: + element_in --http_port
Jul  6 19:33:17 master-1 chronos: + local e
Jul  6 19:33:17 master-1 chronos: + return 1
Jul  6 19:33:17 master-1 chronos: + case "$name" in
Jul  6 19:33:17 master-1 chronos: + cmd+=("--$name" "$(< "$conf_dir/$name")")
Jul  6 19:33:17 master-1 chronos: + read -u 9 -r -d '' path
Jul  6 19:33:17 master-1 chronos: + logged chronos run_jar --zk_hosts 10.10.3.65:2181 --master zk://10.10.3.65:2181/mesos --zk_path /chronos --hostname master-1 --http_port 8081
Jul  6 19:33:17 master-1 chronos: + local 'token=chronos[6064]'
Jul  6 19:33:17 master-1 chronos: + shift
Jul  6 19:33:17 master-1 chronos: + exec
Jul  6 19:33:17 master-1 chronos: + exec
Jul  6 19:33:18 master-1 chronos: ++ exec logger -p user.info -t 'chronos[6064]'
Jul  6 19:33:18 master-1 chronos: ++ exec logger -p user.notice -t 'chronos[6064]'
Jul  6 19:33:18 master-1 chronos[6064]: + run_jar --zk_hosts 10.10.3.65:2181 --master zk://10.10.3.65:2181/mesos --zk_path /chronos --hostname master-1 --http_port 8081
Jul  6 19:33:18 master-1 chronos[6064]: + local 'log_format=%2$s %5$s%6$s%n'
Jul  6 19:33:18 master-1 chronos[6064]: ++ ulimit -n
Jul  6 19:33:18 master-1 chronos[6064]: + '[' 0 -eq 0 -a 1024 -lt 8192 ']'
Jul  6 19:33:18 master-1 chronos[6064]: + ulimit -n 8192
Jul  6 19:33:18 master-1 chronos[6064]: + export PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
Jul  6 19:33:18 master-1 chronos[6064]: + PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
Jul  6 19:33:18 master-1 chronos[6064]: + vm_opts=(-Djava.library.path=/usr/local/lib:/usr/lib64:/usr/lib -Djava.util.logging.SimpleFormatter.format="$log_format")
Jul  6 19:33:18 master-1 chronos[6064]: + local vm_opts
Jul  6 19:33:18 master-1 chronos[6064]: + for j_opt in '${JAVA_OPTS:-"-Xmx512m"}'
Jul  6 19:33:18 master-1 chronos[6064]: + vm_opts+=(${j_opt})
Jul  6 19:33:18 master-1 chronos[6064]: + exec java -Djava.library.path=/usr/local/lib:/usr/lib64:/usr/lib '-Djava.util.logging.SimpleFormatter.format=%2$s %5$s%6$s%n' -Xmx512m -cp /usr/bin/chronos org.apache.mesos.chronos.scheduler.Main --zk_hosts 10.10.3.65:2181 --master zk://10.10.3.65:2181/mesos --zk_path /chronos --hostname master-1 --http_port 8081
Jul  6 19:33:18 master-1 chronos[6064]: [2015-07-06 19:33:18,314] INFO --------------------- (org.apache.mesos.chronos.scheduler.Main$:26)
Jul  6 19:33:18 master-1 chronos[6064]: [2015-07-06 19:33:18,316] INFO Initializing chronos. (org.apache.mesos.chronos.scheduler.Main$:27)
Jul  6 19:33:18 master-1 chronos[6064]: [2015-07-06 19:33:18,318] INFO --------------------- (org.apache.mesos.chronos.scheduler.Main$:28)
Jul  6 19:33:20 master-1 chronos[6064]: [2015-07-06 19:33:20,512] INFO Wiring up the application (org.apache.mesos.chronos.scheduler.config.MainModule:38)
Jul  6 19:33:20 master-1 chronos[6064]: #
Jul  6 19:33:20 master-1 chronos[6064]: # A fatal error has been detected by the Java Runtime Environment:
Jul  6 19:33:20 master-1 chronos[6064]: #
Jul  6 19:33:20 master-1 chronos[6064]: #  SIGSEGV (0xb) at pc=0x00007f7c54ddf56c, pid=6064, tid=140171988526848
Jul  6 19:33:20 master-1 chronos[6064]: #
Jul  6 19:33:20 master-1 chronos[6064]: # JRE version: OpenJDK Runtime Environment (7.0_75-b13) (build 1.7.0_75-mockbuild_2015_01_21_05_53-b00)
Jul  6 19:33:20 master-1 chronos[6064]: # Java VM: OpenJDK 64-Bit Server VM (24.75-b04 mixed mode linux-amd64 compressed oops)
Jul  6 19:33:20 master-1 chronos[6064]: # Derivative: IcedTea 2.5.4
Jul  6 19:33:20 master-1 chronos[6064]: # Distribution: Built on CentOS Linux release 7.0.1406 (Core)  (Wed Jan 21 05:53:48 UTC 2015)
Jul  6 19:33:20 master-1 chronos[6064]: # Problematic frame:
Jul  6 19:33:20 master-1 chronos[6064]: # C  [libc.so.6+0x8056c]  cfree+0x1c
Jul  6 19:33:20 master-1 chronos[6064]: #
Jul  6 19:33:20 master-1 chronos[6064]: # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
Jul  6 19:33:20 master-1 chronos[6064]: #
Jul  6 19:33:20 master-1 chronos[6064]: # An error report file with more information is saved as:
Jul  6 19:33:20 master-1 chronos[6064]: # /tmp/jvm-6064/hs_error.log
Jul  6 19:33:20 master-1 chronos[6064]: #
Jul  6 19:33:20 master-1 chronos[6064]: # If you would like to submit a bug report, please include
Jul  6 19:33:20 master-1 chronos[6064]: # instructions on how to reproduce the bug and visit:
Jul  6 19:33:20 master-1 chronos[6064]: #   http://icedtea.classpath.org/bugzilla
Jul  6 19:33:20 master-1 chronos[6064]: #

Here is the generated error description file : https://gist.github.com/WinstonSureChill/a17a344b091ea5ee7ede

This is the top of the stacktrace as found in the generated error file:

Stack: [0x00007f7c55856000,0x00007f7c55957000], sp=0x00007f7c55952808, free space=1010k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libc.so.6+0x8056c] cfree+0x1c

[error occurred during error reporting (printing native stack), id 0xb]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j org.apache.mesos.state.ZooKeeperState.initialize(Ljava/lang/String;JLjava/util/concurrent/TimeUnit;Ljava/lang/String;)V+0
j org.apache.mesos.state.ZooKeeperState.<init>(Ljava/lang/String;JLjava/util/concurrent/TimeUnit;Ljava/lang/String;)V+11
j org.apache.mesos.chronos.scheduler.config.ZookeeperModule.provideState()Lorg/apache/mesos/state/State;+40
v ~StubRoutines::call_stub
[...]

The last java calls seem to be Zookeeper related (file https://github.com/apache/mesos/blob/master/src/java/src/org/apache/mesos/state/ZooKeeperState.java), so I'm thinking maybe I have a problem with my zookeeper configuration? Or does someone see an obvious error in the parameters passed to chronos:

Jul  6 19:33:18 master-1 chronos[6064]: + run_jar --zk_hosts 10.10.3.65:2181 --master zk://10.10.3.65:2181/mesos --zk_path /chronos --hostname master-1 --http_port 8081

Here is my zookeeper config: https://gist.github.com/WinstonSureChill/b402d07f0bbffe9b035e

And the mesos config I setup with the config files:

mesos/

zk: zk://10.10.3.65:2181/mesos master: 10.10.3.65

mesos-master/

hostname: f1.linuxrt ip: 10.10.3.65 quorum: 1 work_dir: /var/lib/mesos

Also, Mesos works fine on itself (without Chronos).

Related issues

My problem looks like that one (marathon+mesos): https://github.com/mesosphere/marathon/issues/1352

elouanKeryell-Even commented 9 years ago

I was using java openjdk 1.7. I upgraded to 1.8, reinstalled & reconfigured Zookeeper & Chronos, and now everything works fine.