mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 528 forks source link

All Jobs Fail to Launch #228

Closed stevencox closed 9 years ago

stevencox commented 10 years ago

Folks,

I've had Chronos working before, but on a new cluster with the previous build or a Chronos build downloaded yesterday, all jobs fail in the same way. I'm using Mesos: 0.18.0 on Centos6. Here's output from a fresh run.

What am I missing?

Steve

[2014-06-25 10:58:20,337] WARN No sufficient offers found for task 'ct:1403708300124:0:escottTester', will append to queue (com.airbnb.scheduler.mesos.MesosJobFramework:69)

....

[2014-06-25 10:58:20,845] INFO Received resource offers (com.airbnb.scheduler.mesos.MesosJobFramework:58) [2014-06-25 10:58:20,845] INFO Normal priority queue contains task: ct:1403708300124:0:escottTester (com.airbnb.scheduler.jobs.TaskManager:74) [2014-06-25 10:58:20,846] INFO double (com.airbnb.scheduler.mesos.MesosJobFramework:173) [2014-06-25 10:58:20,846] INFO double (com.airbnb.scheduler.mesos.MesosJobFramework:173) [2014-06-25 10:58:20,846] INFO double (com.airbnb.scheduler.mesos.MesosJobFramework:173) [2014-06-25 10:58:20,847] INFO double (com.airbnb.scheduler.mesos.MesosJobFramework:173) [2014-06-25 10:58:20,847] WARN Ignoring offered resource: RANGES (com.airbnb.scheduler.mesos.MesosJobFramework:196) [2014-06-25 10:58:20,849] INFO Launching task from offer: id { value: "20140624-163738-336533932-5050-11931-14660" } framework_id { value: "20140624-101436-336533932-5050-27838-0000" } slaveid { value: "20140624-104648-336533932-5050-29911-1" } hostname: "c0.skylr.renci.org" resources { name: "cpus" type: SCALAR scalar { value: 0.8999999999999995 } role: "" } resources { name: "mem" type: SCALAR scalar { value: 2776.0 } role: "" } resources { name: "disk" type: SCALAR scalar { value: 8063.0 } role: "" } resources { name: "ports" type: RANGES ranges { range { begin: 31000 end: 31272 } range { begin: 31274 end: 31322 } range { begin: 31324 end: 32000 } } role: "_" } with task: name: "ChronosTask:escottTester" task_id { value: "ct:1403708300124:0:escottTester" } slaveid { value: "20140624-104648-336533932-5050-29911-1" } resources { name: "cpus" type: SCALAR scalar { value: 0.1 } role: "" } resources { name: "mem" type: SCALAR scalar { value: 128.0 } role: "_" } resources { name: "disk" type: SCALAR scalar { value: 256.0 } role: "*" } command { environment { variables { name: "mesos_task_id" value: "ct:1403708300124:0:escottTester" } variables { name: "CHRONOS_JOB_OWNER" value: "" } } value: "echo hi >> /opt/skylr-analytics/y" } (com.airbnb.scheduler.mesos.MesosJobFramework:206) [2014-06-25 10:58:20,849] INFO Purging entry 'T_ct:1403708300124:0:escottTester' via: org.apache.mesos.state.ZooKeeperState (com.airbnb.scheduler.state.MesosStatePersistenceStore:161) [2014-06-25 10:58:20,853] INFO Task 'ct:1403708300124:0:escottTester' launched, status: 'DRIVER_RUNNING' (com.airbnb.scheduler.mesos.MesosJobFramework:222) [2014-06-25 10:58:20,853] INFO No tasks scheduled! Declining offers (com.airbnb.scheduler.mesos.MesosJobFramework:77) [2014-06-25 10:58:20,862] INFO Task with id 'ct:1403708300124:0:escottTester' FAILED (com.airbnb.scheduler.mesos.MesosJobFramework:116) [2014-06-25 10:58:20,862] WARN Task of job: escottTester failed. (com.airbnb.scheduler.jobs.JobScheduler:363)

nelsou commented 10 years ago

Hello @stevencox,

Did you find a solution to your problem ?

ericghlee commented 9 years ago

@stevencox try running your mesos slaves with sudo

chengweiv5 commented 9 years ago

@stevencox please attach what's stdout/stderr from mesos executor sandbox, you can find them from mesos-master web ui, by click 'sandbox' of your task.

stevencox commented 9 years ago

Hi folks - I've gotten past the problem above. I upgraded versions though I'm not sure exactly what fixed this problem. It was quite a while back. Thanks.