mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
Apache License 2.0
4.38k stars 529 forks source link

When Chronos is run using Marathon, jobs submitted to Chronos do not run #338

Open mindscratch opened 9 years ago

mindscratch commented 9 years ago

I have mesos 0.21.1, marathon 0.7.6 and chronos 2.3.0. I've deployed chronos using Marathon.

I am able to create a chronos job, where the command is simple like "echo hello", however, the job never runs. The "success" and "error" counts for the job are always 0. If I tail the chronos logs, I don't see any errors. Also, the Mesos UI shows no tasks for the jobs I submit to Chronos.

I've also tried other commands such as "echo hello > /tmp/hello.txt" and even "curl http://myserver" (so I could watch the server log to see if the job runs).

If I run Chronos outside of Marathon it works just fine.


OS: CentOS 6 Chronos installed with the chronos-2.3.0 rpm. Command used to start chronos: /usr/local/bin/chronos Chronos configured via /etc/chronos/conf __ /etc/chronos/conf/http_port = 8081 /etc/chronos/conf/master = zk://myserver:2181/mesos /etc/chronos/conf/zk_hosts = zk://myserver:2181/mesos Mesos version: 0.21.1 java -version: 1.7.0_55

mindscratch commented 9 years ago

Here's what I see in the logs when I submit a simple job named "say hello" that is scheduled to run every minute using the command: "/usr/bin/echo hello":

WARN adding vertex: say hello
WARN Current number of vertices:1
Persisting job: say hellomindscratch>
Persisting job 'say hello' with data
State J_say hello does not exist yet. Adding to state
State update successful: true
Adding schedule for time:5:41:25 PM UTC
Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state5] <mindscratch> Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 
mindscratch commented 9 years ago

I disabled iptables on all hosts and now it works...looks like a network configuration issue. Chronos (2.3.0) on Marathon (0.7.6) is working just fine.

elingg commented 9 years ago

Thanks @mindscratch for debugging this! The only thing we need to close this issue is some documentation in the readme, The LIBPROCESS_IP environment variable should be set to a PORT that the Mesos Master can communicate with.

mindscratch commented 9 years ago

@elingg slight correction, that should be

The LIBPROCESS_PORT environment variable should be set to a PORT...

instead of LIBPROCESS_IP

To start chronos I created the Marathon application using the following (only command shown for brevity) :

   "cmd": "LIBPROCESS_PORT=9000 /usr/local/bin/chronos --master zk://localhost:2181/mesos --zk_hosts zk://localhost:2181/mesos --http_port $PORT
elingg commented 9 years ago

correct, thx!

clehene commented 9 years ago

@elingg, @mindscratch note that this will only work if the IP is visible from mesos master. If, for example, you're running from within a docker container, you'd have to use host network --net=host and set LIBPROCES_IP to the public IP.

See for details

wangqunOne commented 8 years ago

@mindscratch Can you tell me your json file content? Beacuse I don't know how to make Chronos run using Marathon. Thanks.

robsonpeixoto commented 8 years ago

Are there a marathon reference config file?

mindscratch commented 8 years ago

@wangqunOne I'll post something on Monday.

robsonpeixoto commented 8 years ago

Where @mindscratch ?

mindscratch commented 8 years ago

I'll share in this comment.

mindscratch commented 8 years ago

Marathon configuration for running Chronos:

  "id": "chronos",
  "cmd": "LIBPROCESS_PORT=6500 ./chronos --http_port $PORT",
  "cpus": 1,
  "mem": 512,
  "instances": 1,
  "uris": ["http://myfileserver/chronos-2.4.0.tgz"],
  "ports": [4400],
  "requirePorts": true,
  "healthChecks": [
    {"protocol": "HTTP", "path": "/scheduler/jobs"}
payneio commented 7 years ago

How about a marathon config for running chronos in a docker container?

ianjuma commented 7 years ago

Something like this should work; with constraints

  "id": "chronos",
  "args": [
    "--mesos_framework_name=chronos" ,
  "cpus": 0.5,
  "ports": [8080, 8081],
  "constraints": [["hostname", "LIKE", "<hostname>"]],
  "mem": 500.0,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "mesosphere/chronos:v3.0.0",
      "forcePullImage": true,
      "network": "HOST"
  "healthChecks": [
        "path": "/",
        "port": 8080,
        "protocol": "HTTP",
        "gracePeriodSeconds": 300,
        "intervalSeconds": 60,
        "timeoutSeconds": 20,
        "maxConsecutiveFailures": 3,
        "ignoreHttp1xx": false
  "env": {
    "PORT0": "8080",
    "PORT1": "8081"
yogeshnath commented 6 years ago

I was able to run Chronos and schedule a job but it stays there. Noticed that chronos framework becomes inactive in mesos after couple of mins.