mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.38k stars 529 forks source link

Can't manually start a job created without a scheduler #704

Open krestjaninoff opened 8 years ago

krestjaninoff commented 8 years ago

I'm trying to use Chronos for starting maintenance jobs. The most tricky moment is I don't need them running on scheduler - I want to invoke a job manually (e.g. during CI deployments).

For that, I've created the following manifest

{
  "name": "db_backup",

  "cpus": 1,
  "mem": 256,

  "owner": "me",
  "async": false,
  "epsilon": "PT1M",

  "uris": [],
  "container": {
    "type": "DOCKER",
    "network": "HOST",
    "image": "{{docker_registry}}/{{image_name}}:{{image_version}}",
    "forcePullImage": true
  },

  "environmentVariables": [
     {
       "name": "CONTAINER_NAME",
       "value": "ZZZ"
     }
  ],

  "command": "/backup_s3.sh"
}

and uploaded it into Chronos using scheduler/iso8601 endpoint:

- name: Add chronos task
  uri:
    url:          http://{{chronos_host}}:{{chronos_port}}{{chronos_path}}scheduler/iso8601
    method:       POST
    body:         "{{ manifest_content }}"
    body_format:  json
    status_code:  204

After that I tried to trigger my job using scheduler/job endpoint:

- name: Trigger chronos app
  uri:
    url:          http://{{chronos_host}}:{{chronos_port}}{{chronos_path}}scheduler/job/db_backup?arguments=-debug
    method:       PUT
    status_code:  204

But my job wasn't started :( In Chronos logs I found the following:

Jul 19 15:18:59 ip-10-1-4-231 chronos[23610]:  (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:131)
Jul 19 15:18:59 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:18:59,558] INFO Declining unused offers. (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:89)
Jul 19 15:18:59 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:18:59,558] INFO Declined unused offers with filter refuseSeconds=5.0 (use --decline_offer_duration to reconfigure) (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:97)
Jul 19 15:19:00 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:00,126] INFO 10.2.15.211 -  -  [19/Jul/2016:15:19:00 +0000] "GET /scheduler/graph/csv HTTP/1.0" 200 109 "http://myhost.com/chronos/" "Mozilla/5.0 (Ma
Jul 19 15:19:00 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:00,130] INFO 10.2.15.211 -  -  [19/Jul/2016:15:19:00 +0000] "GET /scheduler/jobs HTTP/1.0" 200 4407 "http://myhost.com/chronos/" "Mozilla/5.0 (Macint
Jul 19 15:19:01 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:01,728] INFO Manually triggering job:db_backup (org.apache.mesos.chronos.scheduler.api.JobManagementResource:151)
Jul 19 15:19:01 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:01,728] INFO JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R0/2016-07-20T11:26:33.131Z/PT24H,db_backup,./backup_s3.sh test,PT1M,1,0,,,2,datalore,,,20
Jul 19 15:19:01 ip-10-1-4-231 chronos[23610]: (LOG_LEVEL_APP,INFO), EnvironmentVariable(LOG_LEVEL_ROOT,INFO), EnvironmentVariable(AWS_SECRET_KEY,XXX), EnvironmentVariable(AWS_ACCESS_KEY,YYY
Jul 19 15:19:01 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:01,728] INFO Updating state for job (db_backup) to queued (org.apache.mesos.chronos.scheduler.jobs.stats.JobStats:62)
Jul 19 15:19:01 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:01,728] INFO 10.2.15.211 -  -  [19/Jul/2016:15:19:01 +0000] "PUT /scheduler/job/db_backup?arguments=-debug HTTP/1.0" 204 0 "-" "Python-httplib2/0.9.2 (gzip)" (mesosphere.
Jul 19 15:19:03 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:03,589] INFO Received resource offers (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:82)
Jul 19 15:19:03 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:03,590] INFO Normal priority queue contains task: ct:1468941541728:0:db_backup: (org.apache.mesos.chronos.scheduler.jobs.TaskManager:82)
Jul 19 15:19:03 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:03,590] INFO JobNotificationObserver does not handle JobExpired(ScheduleBasedJob(R0/2016-07-20T11:26:33.131Z/PT24H,db_backup,./backup_s3.sh test,PT1M,1,0,,,2,datalore,,,2
Jul 19 15:19:03 ip-10-1-4-231 chronos[23610]: e(LOG_LEVEL_APP,INFO), EnvironmentVariable(LOG_LEVEL_ROOT,INFO), EnvironmentVariable(AWS_SECRET_KEY,XXX), EnvironmentVariable(AWS_ACCESS_KEY,YYY
Jul 19 15:19:03 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:03,590] INFO Updating state for job (db_backup) to idle (org.apache.mesos.chronos.scheduler.jobs.stats.JobStats:62)
Jul 19 15:19:03 ip-10-1-4-231 chronos[23610]: [2016-07-19 15:19:03,590] INFO No tasks scheduled or next task has been disabled.
Jul 19 15:19:03 ip-10-1-4-231 chronos[23610]:  (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:131)

I'm using Chronos 2.4.0 with Mesos 0.28.1.

shankarapailoor commented 8 years ago

I am seeing the very same issue using 2.4.0 and 0.28.1. It only is able to run if I delete the job and create again.

prakhassh commented 6 years ago

When you are scheduling the chronos Job, giving the parameter as R0/Time when it has to run/Repeating time

Even if you planning to run the chronos job only once it is important to give all the 3 parameters for the scheduler

{
  "schedule": "R/2014-09-25T17:22:00Z/PT2M",
  "name": "dockerjob",
  "container": {
    "type": "DOCKER",
    "image": "libmesos/ubuntu",
    "network": "BRIDGE",
    "volumes": [
      {
        "containerPath": "/var/log/",
        "hostPath": "/logs/",
        "mode": "RW"
      }
    ]
  },
  "cpus": "0.5",
  "mem": "512",
  "fetch": [],
  "command": "while sleep 10; do date =u %T; done"
}