mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.38k stars 529 forks source link

When Chronos is run using Marathon, jobs submitted to Chronos do not run #338

Open mindscratch opened 9 years ago

mindscratch commented 9 years ago

I have mesos 0.21.1, marathon 0.7.6 and chronos 2.3.0. I've deployed chronos using Marathon.

I am able to create a chronos job, where the command is simple like "echo hello", however, the job never runs. The "success" and "error" counts for the job are always 0. If I tail the chronos logs, I don't see any errors. Also, the Mesos UI shows no tasks for the jobs I submit to Chronos.

I've also tried other commands such as "echo hello > /tmp/hello.txt" and even "curl http://myserver" (so I could watch the server log to see if the job runs).

If I run Chronos outside of Marathon it works just fine.

Details:

OS: CentOS 6 Chronos installed with the chronos-2.3.0 rpm. Command used to start chronos: /usr/local/bin/chronos Chronos configured via /etc/chronos/conf __ /etc/chronos/conf/http_port = 8081 /etc/chronos/conf/master = zk://myserver:2181/mesos /etc/chronos/conf/zk_hosts = zk://myserver:2181/mesos Mesos version: 0.21.1 java -version: 1.7.0_55

mindscratch commented 9 years ago

Here's what I see in the logs when I submit a simple job named "say hello" that is scheduled to run every minute using the command: "/usr/bin/echo hello":

WARN adding vertex: say hello
WARN Current number of vertices:1
Persisting job: say hellomindscratch>
Persisting job 'say hello' with data
State J_say hello does not exist yet. Adding to state
State update successful: true
Adding schedule for time:5:41:25 PM UTC
Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state5] <mindscratch> Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 
mindscratch commented 9 years ago

I disabled iptables on all hosts and now it works...looks like a network configuration issue. Chronos (2.3.0) on Marathon (0.7.6) is working just fine.

elingg commented 9 years ago

Thanks @mindscratch for debugging this! The only thing we need to close this issue is some documentation in the readme, The LIBPROCESS_IP environment variable should be set to a PORT that the Mesos Master can communicate with.

mindscratch commented 9 years ago

@elingg slight correction, that should be

The LIBPROCESS_PORT environment variable should be set to a PORT...

instead of LIBPROCESS_IP

To start chronos I created the Marathon application using the following (only command shown for brevity) :

{
   "cmd": "LIBPROCESS_PORT=9000 /usr/local/bin/chronos --master zk://localhost:2181/mesos --zk_hosts zk://localhost:2181/mesos --http_port $PORT
}
elingg commented 9 years ago

correct, thx!

clehene commented 9 years ago

@elingg, @mindscratch note that this will only work if the IP is visible from mesos master. If, for example, you're running from within a docker container, you'd have to use host network --net=host and set LIBPROCES_IP to the public IP.

See https://issues.apache.org/jira/browse/MESOS-2587 for details

wangqunOne commented 8 years ago

@mindscratch Can you tell me your json file content? Beacuse I don't know how to make Chronos run using Marathon. Thanks.

robsonpeixoto commented 8 years ago

Are there a marathon reference config file?

mindscratch commented 8 years ago

@wangqunOne I'll post something on Monday.

robsonpeixoto commented 8 years ago

Where @mindscratch ?

mindscratch commented 8 years ago

I'll share in this comment.

mindscratch commented 8 years ago

Marathon configuration for running Chronos:

{
  "id": "chronos",
  "cmd": "LIBPROCESS_PORT=6500 ./chronos --http_port $PORT",
  "cpus": 1,
  "mem": 512,
  "instances": 1,
  "uris": ["http://myfileserver/chronos-2.4.0.tgz"],
  "ports": [4400],
  "requirePorts": true,
  "healthChecks": [
    {"protocol": "HTTP", "path": "/scheduler/jobs"}
  ]
}
payneio commented 7 years ago

How about a marathon config for running chronos in a docker container?

ianjuma commented 7 years ago

Something like this should work; with constraints

{
  "id": "chronos",
  "args": [
    "--mesos_role=private",
    "--mesos_framework_name=chronos" ,
    "--hostname=<hostname>",
    "--master=zk://<ip>:2181,<ip>:2181,<ip>:2181/mesos",
    "--zk_hosts=zk://<ip>:2181,<ip>:2181,<ip>:2181",
    "--http_credentials=username:pass"
  ],
  "cpus": 0.5,
  "ports": [8080, 8081],
  "constraints": [["hostname", "LIKE", "<hostname>"]],
  "mem": 500.0,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "mesosphere/chronos:v3.0.0",
      "forcePullImage": true,
      "network": "HOST"
    }
  },
  "healthChecks": [
      {
        "path": "/",
        "port": 8080,
        "protocol": "HTTP",
        "gracePeriodSeconds": 300,
        "intervalSeconds": 60,
        "timeoutSeconds": 20,
        "maxConsecutiveFailures": 3,
        "ignoreHttp1xx": false
      }
  ],
  "env": {
    "PORT0": "8080",
    "PORT1": "8081"
  }
}
yogeshnath commented 6 years ago

I was able to run Chronos and schedule a job but it stays there. Noticed that chronos framework becomes inactive in mesos after couple of mins.