mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.39k stars 528 forks source link

Jobs not getting executed with repeat interval in seconds. #476

Closed pankaj36 closed 9 years ago

pankaj36 commented 9 years ago

I am using chronos-2.3.4.

I have created a simple job with 'schedule' key set as "R10//PT2S". Only first instance is getting executed by mesos and rest are getting stuck in chronos queue. I can see 9 Tasks(T_ct:[time]:0:JOB1) in zookeeper /chronos/state/state/ node. Only after restart(after some time) these tasks are getting removed from zookeeper(without executed).

You can reproduce this using this REST call: 1)curl -L -H 'Content-Type: application/json' -X POST -d '{ "schedule": "R10//PT2S", "name": "JOB1", "epsilon": "PT15M", "command": "sleep 2", "owner": "xyz@abc.com", "async": false, "retries":2 }' chronos-host:4040/scheduler/iso8601

This is the zookeeper data after some time: [zk: localhost:2181(CONNECTED) 27] ls /chronos/state/state/

T_ct:1435137092113:0:JOB1: T_ct:1435137080113:0:JOB1: J_JOB1 T_ct:1435137076113:0:JOB1: T_ct:1435137090113:0:JOB1: T_ct:1435137086113:0:JOB1: frameworkId T_ct:1435137084113:0:JOB1: T_ct:1435137082113:0:JOB1: T_ct:1435137088113:0:JOB1: T_ct:1435137078113:0:JOB1:

I did some debugging and found that its happening only when we have few tasks(with T_ct prefix) in chronos priority queue and Job 'schedule' key got updated to 'R0/[time]/PT2S'. I am assuming that after every minute chronos if reading jobs info data and creating tasks(updating zookeeper /chronos/state/state/ node) depending on the schedule info and again updating 'schedule' key.

Am I missing anything(any configuration). Everything would work fine if I set 'schedule' key as 'R//PT2S' or if we set job repeat interval more than a minute.

depay commented 9 years ago

@pankaj36 whether the job became disabled=true? u can try "sleep 30" or more and see whether it executes. Chronos will make task before its exec-time and the schedule will minus to 0. if an early task finished and found rec=0, it thought all tasks finished then disable the job. Bug for a long time.

elingg commented 9 years ago

hey @depay, @pankaj36 is this possibly related to the bug you fixed, which was fixed after the 2.3.4 release https://github.com/mesos/chronos/pull/465? With your fix, tasks should not be getting removed from zookeeper early and only after they actually execute.

depay commented 9 years ago

@elingg seems not that bug.. plz see my comment above

pankaj36 commented 9 years ago

@depay I tried "sleep 30" with schedule key set as "R2//PT2S". Still second job is not getting executed. Now I can't see any pending task in zookeeper. Probably chronos disabled second task while first one was still running.

What is PT2S? Is it time interval between two consecutive instance of a job or a time gap when next instance will be launched after completion of previous instance?

When I tried one job: curl -L -H 'Content-Type: application/json' -X POST -d '{ "schedule": "R10//PT10S", "name": "long_running_job", "epsilon": "PT5M", "command": "sleep 40", "owner": "xyz@abc.com", "async": false, "retries":2 }' hostname:4040/scheduler/iso8601 only 2 instances of the job got executed and there is no queued task in zookeeper. What happened to other 8 jobs. Is this because of time interval of 10 seconds(PT10S). Since one instance is taking 40 seconds, chronos will disable next 4 task.

elingg commented 9 years ago

@pankaj36, To answer this question:

"What is PT2S? Is it time interval between two consecutive instance of a job or a time gap when next instance will be launched after completion of previous instance?"

Your job will start at a specified time (by default, the current time). If you specify the repeat interval as 2S, then it will run every 2S after the start time for the given number of times it is supposed to repeat. Note, that while a job is still running, it will not run another instance of the same job so there may be some latency.

Regarding the bug overall, interesting. It seems to be a bug when the repeat interval is specified, since it seems to be happening only in that case. It seems to be happening when the duration of the job execution is longer than the repeat interval. I will leave this open as a bug for further investigation. (cc @aquamatthias who is working on Chronos as well).

depay commented 9 years ago

@pankaj36 did warning The head of the task queue appears to already be running appear in your log?

pankaj36 commented 9 years ago

@depay Yes.

15/06/30 12:06:58 INFO Main$: --------------------- 15/06/30 12:06:58 INFO Main$: Initializing chronos. 15/06/30 12:06:58 INFO Main$: --------------------- 15/06/30 12:07:00 INFO MainModule: Wiring up the application 15/06/30 12:07:00 INFO FrameworkIdUtil: Setting framework ID to 20150506-160505-748136640-5050-1733-0000 15/06/30 12:07:00 INFO CuratorFrameworkImpl: Starting 15/06/30 12:07:00 DEBUG CuratorZookeeperClient: Starting 15/06/30 12:07:00 DEBUG ConnectionState: Starting 15/06/30 12:07:00 DEBUG ConnectionState: reset 15/06/30 12:07:00 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 15/06/30 12:07:00 INFO ZooKeeper: Client environment:host.name=chronos-host 15/06/30 12:07:00 INFO ZooKeeper: Client environment:java.version=1.6.0_35 15/06/30 12:07:00 INFO ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 15/06/30 12:07:00 INFO ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-6-openjdk-amd64/jre 15/06/30 12:07:00 INFO ZooKeeper: Client environment:java.class.path=/home/hduser/chronos-2.3.4/target/chronos-2.3.4.jar 15/06/30 12:07:00 INFO ZooKeeper: Client environment:java.library.path=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64:/usr/lib/jvm/java-6-openjdk-amd64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x8664-linux-gnu:/usr/lib/jni:/lib:/usr/lib 15/06/30 12:07:00 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp 15/06/30 12:07:00 INFO ZooKeeper: Client environment:java.compiler= 15/06/30 12:07:00 INFO ZooKeeper: Client environment:os.name=Linux 15/06/30 12:07:00 INFO ZooKeeper: Client environment:os.arch=amd64 15/06/30 12:07:00 INFO ZooKeeper: Client environment:os.version=3.11.0-24-generic 15/06/30 12:07:00 INFO ZooKeeper: Client environment:user.name=root 15/06/30 12:07:00 INFO ZooKeeper: Client environment:user.home=/root 15/06/30 12:07:00 INFO ZooKeeper: Client environment:user.dir=/home/hduser 15/06/30 12:07:00 INFO ZooKeeper: Initiating client connection, connectString=air-itag4:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@200cef 15/06/30 12:07:00 DEBUG ClientCnxn: zookeeper.disableAutoWatchReset is false 15/06/30 12:07:00 INFO ZookeeperModule: Connecting to ZK... 15/06/30 12:07:00 INFO ClientCnxn: Opening socket connection to server chronos-host/192.168.151.44:2181. Will not attempt to authenticate using SASL (unknown error) 15/06/30 12:07:00 INFO ClientCnxn: Socket connection established to chronos-host/192.168.151.44:2181, initiating session 15/06/30 12:07:00 DEBUG ClientCnxn: Session establishment request sent on chronos-host/192.168.151.44:2181 15/06/30 12:07:00 INFO ClientCnxn: Session establishment complete on server chronos-host/192.168.151.44:2181, sessionid = 0x14e19dd24fb00e7, negotiated timeout = 40000 15/06/30 12:07:00 INFO ConnectionStateManager: State change: CONNECTED 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 1,3 replyHeader:: 1,540969,0 request:: '/chronos,F response:: s{51,51,1430908505648,1430908505648,0,1,0,0,0,1,52} 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 2,3 replyHeader:: 2,540969,0 request:: '/chronos/state,F response:: s{52,52,1430908505656,1430908505656,0,2,0,0,0,2,54} 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 3,3 replyHeader:: 3,540969,0 request:: '/chronos/state/state,F response:: s{53,53,1430908505664,1430908505664,0,71127,0,0,0,1,540964} 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 4,3 replyHeader:: 4,540969,0 request:: '/chronos,F response:: s{51,51,1430908505648,1430908505648,0,1,0,0,0,1,52} 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 5,3 replyHeader:: 5,540969,0 request:: '/chronos/state,F response:: s{52,52,1430908505656,1430908505656,0,2,0,0,0,2,54} 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 6,3 replyHeader:: 6,540969,0 request:: '/chronos/state/candidate,F response:: s{54,54,1430908505705,1430908505705,0,34814,0,0,0,0,540965} 15/06/30 12:07:00 DEBUG log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.eclipse.jetty.util.log) via org.eclipse.jetty.util.log.Slf4jLog 15/06/30 12:07:00 DEBUG ServletHandler: filterNameMap={} 15/06/30 12:07:00 DEBUG ServletHandler: pathFilters=null 15/06/30 12:07:00 DEBUG ServletHandler: servletFilterMap=null 15/06/30 12:07:00 DEBUG ServletHandler: servletPathMap={/=org.eclipse.jetty.servlet.DefaultServlet-65d4dad5} 15/06/30 12:07:00 DEBUG ServletHandler: servletNameMap={org.eclipse.jetty.servlet.DefaultServlet-65d4dad5=org.eclipse.jetty.servlet.DefaultServlet-65d4dad5} 15/06/30 12:07:00 DEBUG ServletHandler: filterNameMap={com.google.inject.servlet.GuiceFilter-80a92ed=com.google.inject.servlet.GuiceFilter-80a92ed} 15/06/30 12:07:00 DEBUG ServletHandler: pathFilters=[[/]/[]==31=>com.google.inject.servlet.GuiceFilter-80a92ed] 15/06/30 12:07:00 DEBUG ServletHandler: servletFilterMap={} 15/06/30 12:07:00 DEBUG ServletHandler: servletPathMap={/=org.eclipse.jetty.servlet.DefaultServlet-65d4dad5} 15/06/30 12:07:00 DEBUG ServletHandler: servletNameMap={org.eclipse.jetty.servlet.DefaultServlet-65d4dad5=org.eclipse.jetty.servlet.DefaultServlet-65d4dad5} 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.server.Server@6ed749de + SelectChannelConnector@0.0.0.0:4400 as connector 15/06/30 12:07:00 DEBUG Container: Container com.codahale.metrics.jetty8.InstrumentedHandler@7e0913b6 + o.e.j.s.ServletContextHandler{/,null} as handler 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.server.handler.HandlerCollection@5ed525f + org.eclipse.jetty.server.handler.ResourceHandler@1fbade03 as handler 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.server.handler.HandlerCollection@5ed525f + com.codahale.metrics.jetty8.InstrumentedHandler@7e0913b6 as handler 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.server.handler.HandlerCollection@5ed525f + org.eclipse.jetty.server.handler.RequestLogHandler@2545a73b as handler 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.server.Server@6ed749de + org.eclipse.jetty.server.handler.HandlerCollection@5ed525f as handler 15/06/30 12:07:00 WARN HttpModule: No SSL support configured. 15/06/30 12:07:00 DEBUG HttpService: Starting up HttpServer. 15/06/30 12:07:00 DEBUG AbstractLifeCycle: starting org.eclipse.jetty.server.Server@6ed749de 15/06/30 12:07:00 INFO JobScheduler: Trying to become leader. 15/06/30 12:07:00 INFO Server: jetty-8.y.z-SNAPSHOT 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.server.Server@6ed749de + qtp2092382034{8<=0<=0/254,-1} as threadpool 15/06/30 12:07:00 DEBUG AbstractLifeCycle: starting org.eclipse.jetty.server.handler.HandlerCollection@5ed525f 15/06/30 12:07:00 DEBUG AbstractLifeCycle: starting org.eclipse.jetty.server.handler.ResourceHandler@1fbade03 15/06/30 12:07:00 DEBUG AbstractHandler: starting org.eclipse.jetty.server.handler.ResourceHandler@1fbade03 15/06/30 12:07:00 DEBUG AbstractLifeCycle: STARTED org.eclipse.jetty.server.handler.ResourceHandler@1fbade03 15/06/30 12:07:00 DEBUG AbstractLifeCycle: starting com.codahale.metrics.jetty8.InstrumentedHandler@7e0913b6 15/06/30 12:07:00 DEBUG AbstractLifeCycle: starting o.e.j.s.ServletContextHandler{/,null} 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.servlet.ServletHandler@7881b008 + com.google.inject.servlet.GuiceFilter-80a92ed as filter 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.servlet.ServletHandler@7881b008 + [/]/[]==31=>com.google.inject.servlet.GuiceFilter-80a92ed as filterMapping 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.servlet.ServletHandler@7881b008 + org.eclipse.jetty.servlet.DefaultServlet-65d4dad5 as servlet 15/06/30 12:07:00 DEBUG Container: Container org.eclipse.jetty.servlet.ServletHandler@7881b008 + [/]=>org.eclipse.jetty.servlet.DefaultServlet-65d4dad5 as servletMapping 15/06/30 12:07:00 DEBUG Container: Container o.e.j.s.ServletContextHandler{/,null} + org.eclipse.jetty.servlet.ServletHandler@7881b008 as handler 15/06/30 12:07:00 DEBUG AbstractLifeCycle: starting org.eclipse.jetty.servlet.ServletHandler@7881b008 15/06/30 12:07:00 DEBUG ServletHandler: filterNameMap={com.google.inject.servlet.GuiceFilter-80a92ed=com.google.inject.servlet.GuiceFilter-80a92ed} 15/06/30 12:07:00 DEBUG ServletHandler: pathFilters=[[/_]/[]==31=>com.google.inject.servlet.GuiceFilter-80a92ed] 15/06/30 12:07:00 DEBUG ServletHandler: servletFilterMap={} 15/06/30 12:07:00 DEBUG ServletHandler: servletPathMap={/*=org.eclipse.jetty.servlet.DefaultServlet-65d4dad5} 15/06/30 12:07:00 DEBUG ServletHandler: servletNameMap={org.eclipse.jetty.servlet.DefaultServlet-65d4dad5=org.eclipse.jetty.servlet.DefaultServlet-65d4dad5} 15/06/30 12:07:00 DEBUG AbstractHandler: starting org.eclipse.jetty.servlet.ServletHandler@7881b008 15/06/30 12:07:00 DEBUG AbstractLifeCycle: STARTED org.eclipse.jetty.servlet.ServletHandler@7881b008 15/06/30 12:07:00 DEBUG AbstractHandler: starting o.e.j.s.ServletContextHandler{/,null} 15/06/30 12:07:00 DEBUG AbstractLifeCycle: starting com.google.inject.servlet.GuiceFilter-80a92ed 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:/chronos/state/candidate/_c_eb1ea21c-31ec-4971-bd73-52a33998306a-latch- serverPath:/chronos/state/candidate/_c_eb1ea21c-31ec-4971-bd73-52a33998306a-latch- finished:false header:: 7,1 replyHeader:: 7,540970,0 request:: '/chronos/state/candidate/_c_eb1ea21c-31ec-4971-bd73-52a33998306a-latch-,#6169722d69746167343a34343030,v{s{31,s{'world,'anyone}}},3 response:: '/chronos/state/candidate/_c_eb1ea21c-31ec-4971-bd73-52a33998306a-latch-0000017407 15/06/30 12:07:00 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:/chronos/state/candidate serverPath:/chronos/state/candidate finished:false header:: 8,12 replyHeader:: 8,540970,0 request:: '/chronos/state/candidate,F response:: v{'_c_eb1ea21c-31ec-4971-bd73-52a33998306a-latch-0000017407},s{54,54,1430908505705,1430908505705,0,34815,0,0,0,1,540970} 15/06/30 12:07:00 INFO JobScheduler: Elected as leader. 15/06/30 12:07:00 INFO JobScheduler: Loading tasks 15/06/30 12:07:00 INFO JobScheduler: Loading jobs 15/06/30 12:07:00 INFO JobUtils$: Registering jobs:0 15/06/30 12:07:00 INFO JobScheduler: Starting chronos driver 15/06/30 12:07:00 INFO JobScheduler: Running background thread 15/06/30 12:07:00 INFO MesosJobFramework: Registered 15/06/30 12:07:00 INFO JobScheduler: Starting run loop for JobScheduler. CurrentTime: 2015-06-30T06:37:00.860Z 15/06/30 12:07:00 INFO JobScheduler: Size of streams: 0 15/06/30 12:07:00 INFO JobScheduler: Checking schedules with time horizon:PT60S 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider as a provider class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter as a provider class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering mesosphere.chaos.validation.ConstraintViolationExceptionMapper as a provider class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering org.apache.mesos.chronos.scheduler.api.Iso8601JobResource as a root resource class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering org.apache.mesos.chronos.scheduler.api.DependentJobResource as a root resource class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering org.apache.mesos.chronos.scheduler.api.JobManagementResource as a root resource class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering org.apache.mesos.chronos.scheduler.api.TaskManagementResource as a root resource class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering org.apache.mesos.chronos.scheduler.api.GraphManagementResource as a root resource class 15/06/30 12:07:00 INFO GuiceComponentProviderFactory: Registering org.apache.mesos.chronos.scheduler.api.StatsResource as a root resource class 15/06/30 12:07:00 INFO WebApplicationImpl: Initiating Jersey application, version 'Jersey: 1.18.1 02/19/2014 03:28 AM' 15/06/30 12:07:00 INFO MesosJobFramework: Master info:id: "20150626-115140-748136640-5050-46288" ip: 748136640 port: 5050 pid: "master@192.168.151.44:5050" hostname: "192.168.151.44"

15/06/30 12:07:00 INFO FrameworkIdUtil: Stored framework ID '20150506-160505-748136640-5050-1733-0000' 15/06/30 12:07:00 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:00 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:00 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding com.codahale.metrics.jersey.InstrumentedResourceMethodDispatchAdapter to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding mesosphere.chaos.validation.ConstraintViolationExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 DEBUG logging: Logging Provider: org.jboss.logging.Log4jLoggerProvider 15/06/30 12:07:01 INFO Version: HV000001: Hibernate Validator 5.1.2.Final 15/06/30 12:07:01 DEBUG DefaultTraversableResolver: Cannot find javax.persistence.Persistence on classpath. Assuming non JPA 2 environment. All properties will per default be traversable. 15/06/30 12:07:01 DEBUG ValidationXmlParser: Trying to load META-INF/validation.xml for XML based Validator configuration. 15/06/30 12:07:01 DEBUG ValidationXmlParser: No META-INF/validation.xml found. Using annotation based configuration only. 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding org.apache.mesos.chronos.scheduler.api.Iso8601JobResource to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding org.apache.mesos.chronos.scheduler.api.DependentJobResource to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding org.apache.mesos.chronos.scheduler.api.JobManagementResource to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding org.apache.mesos.chronos.scheduler.api.TaskManagementResource to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding org.apache.mesos.chronos.scheduler.api.GraphManagementResource to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 INFO GuiceComponentProviderFactory: Binding org.apache.mesos.chronos.scheduler.api.StatsResource to GuiceManagedComponentProvider with the scope "Singleton" 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED com.google.inject.servlet.GuiceFilter-80a92ed 15/06/30 12:07:01 DEBUG AbstractLifeCycle: starting org.eclipse.jetty.servlet.DefaultServlet-65d4dad5 15/06/30 12:07:01 DEBUG Holder: Holding class org.eclipse.jetty.servlet.DefaultServlet 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED org.eclipse.jetty.servlet.DefaultServlet-65d4dad5 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED o.e.j.s.ServletContextHandler{/,null} 15/06/30 12:07:01 DEBUG AbstractHandler: starting com.codahale.metrics.jetty8.InstrumentedHandler@7e0913b6 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED com.codahale.metrics.jetty8.InstrumentedHandler@7e0913b6 15/06/30 12:07:01 DEBUG AbstractLifeCycle: starting org.eclipse.jetty.server.handler.RequestLogHandler@2545a73b 15/06/30 12:07:01 DEBUG AbstractHandler: starting org.eclipse.jetty.server.handler.RequestLogHandler@2545a73b 15/06/30 12:07:01 DEBUG AbstractLifeCycle: starting mesosphere.chaos.http.ChaosRequestLog@5ad82742 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED mesosphere.chaos.http.ChaosRequestLog@5ad82742 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED org.eclipse.jetty.server.handler.RequestLogHandler@2545a73b 15/06/30 12:07:01 DEBUG AbstractHandler: starting org.eclipse.jetty.server.handler.HandlerCollection@5ed525f 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED org.eclipse.jetty.server.handler.HandlerCollection@5ed525f 15/06/30 12:07:01 DEBUG AbstractHandler: starting org.eclipse.jetty.server.Server@6ed749de 15/06/30 12:07:01 DEBUG AbstractLifeCycle: starting qtp2092382034{8<=0<=0/254,-1} 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED qtp2092382034{8<=7<=8/254,0} 15/06/30 12:07:01 DEBUG AbstractLifeCycle: starting SelectChannelConnector@0.0.0.0:4400 15/06/30 12:07:01 DEBUG AbstractLifeCycle: starting null/null 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED PooledBuffers [0/1024@6144,0/1024@16384,0/1024@-]/PooledBuffers [0/1024@6144,0/1024@32768,0/1024@-] 15/06/30 12:07:01 DEBUG AbstractLifeCycle: starting org.eclipse.jetty.server.nio.SelectChannelConnector$ConnectorSelectorManager@7b87c4f2 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED org.eclipse.jetty.server.nio.SelectChannelConnector$ConnectorSelectorManager@7b87c4f2 15/06/30 12:07:01 DEBUG nio: Starting Thread[qtp2092382034-37 Selector0,5,main] on org.eclipse.jetty.io.nio.SelectorManager$1@3d625eb1 15/06/30 12:07:01 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4400 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED SelectChannelConnector@0.0.0.0:4400 15/06/30 12:07:01 DEBUG AbstractLifeCycle: STARTED org.eclipse.jetty.server.Server@6ed749de 15/06/30 12:07:06 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:06 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:06 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:08 DEBUG nio: created SCEP@265602e5{l(/192.168.151.44:50082)<->r(/192.168.151.44:4400),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/06/30 12:07:08 DEBUG HttpParser: filled 311/311 15/06/30 12:07:08 DEBUG Server: REQUEST /scheduler/iso8601 on AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=3,c=151},r=1 15/06/30 12:07:08 DEBUG ContextHandler: scope null||/scheduler/iso8601 @ o.e.j.s.ServletContextHandler{/,null} 15/06/30 12:07:08 DEBUG ContextHandler: context=||/scheduler/iso8601 @ o.e.j.s.ServletContextHandler{/,null} 15/06/30 12:07:08 DEBUG ServletHandler: servlet ||/scheduler/iso8601 -> org.eclipse.jetty.servlet.DefaultServlet-65d4dad5 15/06/30 12:07:08 DEBUG ServletHandler: chain=com.google.inject.servlet.GuiceFilter-80a92ed->org.eclipse.jetty.servlet.DefaultServlet-65d4dad5 15/06/30 12:07:08 DEBUG ServletHandler: call filter com.google.inject.servlet.GuiceFilter-80a92ed 15/06/30 12:07:08 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 9,12 replyHeader:: 9,540972,0 request:: '/chronos/state/candidate,F response:: v{'_c_eb1ea21c-31ec-4971-bd73-52a33998306a-latch-0000017407},s{54,54,1430908505705,1430908505705,0,34815,0,0,0,1,540970} 15/06/30 12:07:08 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 10,4 replyHeader:: 10,540972,0 request:: '/chronos/state/candidate/_c_eb1ea21c-31ec-4971-bd73-52a33998306a-latch-0000017407,F response:: #6169722d69746167343a34343030,s{540970,540970,1435646220826,1435646220826,0,0,0,94041079820255463,14,0,540970} 15/06/30 12:07:08 INFO Iso8601JobResource: Received request for job:ScheduleBasedJob(R10//PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:07:08 WARN JobGraph: Adding vertex:long_running_job 15/06/30 12:07:08 WARN JobGraph: Current number of vertices:1 15/06/30 12:07:08 INFO JobScheduler: Persisting job:long_running_job 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R10//PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State J_long_running_job does not exist yet. Adding to state 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Adding schedule for time:6:37:08 AM UTC 15/06/30 12:07:08 INFO JobScheduler: Checking schedules with time horizon:PT60S 15/06/30 12:07:08 INFO JobScheduler: Calling next for stream: R10//PT10S, jobname: long_running_job 15/06/30 12:07:08 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:37:08.657Z 15/06/30 12:07:08 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:07:08 INFO TaskManager: Scheduling task 'ct:1435646228657:0:long_running_job:' with delay: '0' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State T_ct:1435646228657:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:07:08 INFO TaskManager: Removing task mapping 15/06/30 12:07:08 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R9/2015-06-30T06:37:18.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R9/2015-06-30T06:37:18.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:07:08 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R10//PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646228657:0:long_running_job:,0) 15/06/30 12:07:08 INFO JobStats: Updating state for job (long_running_job) to queued 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Calling next for stream: R9/2015-06-30T06:37:18.658Z/PT10S, jobname: long_running_job 15/06/30 12:07:08 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:37:18.658Z 15/06/30 12:07:08 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:07:08 INFO TaskManager: Scheduling task 'ct:1435646238658:0:long_running_job:' with delay: '9942' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State T_ct:1435646238658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R8/2015-06-30T06:37:28.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R8/2015-06-30T06:37:28.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Calling next for stream: R8/2015-06-30T06:37:28.658Z/PT10S, jobname: long_running_job 15/06/30 12:07:08 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:37:28.658Z 15/06/30 12:07:08 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:07:08 INFO TaskManager: Scheduling task 'ct:1435646248658:0:long_running_job:' with delay: '19894' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State T_ct:1435646248658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R7/2015-06-30T06:37:38.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R7/2015-06-30T06:37:38.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Calling next for stream: R7/2015-06-30T06:37:38.658Z/PT10S, jobname: long_running_job 15/06/30 12:07:08 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:37:38.658Z 15/06/30 12:07:08 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:07:08 INFO TaskManager: Scheduling task 'ct:1435646258658:0:long_running_job:' with delay: '29827' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State T_ct:1435646258658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R6/2015-06-30T06:37:48.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R6/2015-06-30T06:37:48.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Calling next for stream: R6/2015-06-30T06:37:48.658Z/PT10S, jobname: long_running_job 15/06/30 12:07:08 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:37:48.658Z 15/06/30 12:07:08 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:07:08 INFO TaskManager: Scheduling task 'ct:1435646268658:0:long_running_job:' with delay: '39778' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State T_ct:1435646268658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R5/2015-06-30T06:37:58.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R5/2015-06-30T06:37:58.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Calling next for stream: R5/2015-06-30T06:37:58.658Z/PT10S, jobname: long_running_job 15/06/30 12:07:08 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:37:58.658Z 15/06/30 12:07:08 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:07:08 INFO TaskManager: Scheduling task 'ct:1435646278658:0:long_running_job:' with delay: '49721' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State T_ct:1435646278658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:08 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:07:08 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:08 INFO JobScheduler: Calling next for stream: R4/2015-06-30T06:38:08.658Z/PT10S, jobname: long_running_job 15/06/30 12:07:08 INFO Iso8601JobResource: Added job to JobGraph 15/06/30 12:07:08 INFO ChaosRequestLog: 192.168.151.44 - - [30/Jun/2015:06:37:08 +0000] "POST /scheduler/iso8601 HTTP/1.1" 204 0 "-" "curl/7.38.0" 15/06/30 12:07:08 DEBUG Server: RESPONSE /scheduler/iso8601 204 handled=true 15/06/30 12:07:08 DEBUG AsyncHttpConnection: Enabled read interest SCEP@265602e5{l(/192.168.151.44:50082)<->r(/192.168.151.44:4400),s=1,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0r}-{AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=4,h=0,b=-1,c=-1},p=HttpParser{s=0,l=3,c=151},r=1} 15/06/30 12:07:08 DEBUG ChannelEndPoint: ishut SCEP@265602e5{l(/192.168.151.44:50082)<->r(/192.168.151.44:4400),s=1,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0r}-{AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=-3},r=1} 15/06/30 12:07:08 DEBUG HttpParser: filled -1/0 15/06/30 12:07:08 DEBUG AsyncHttpConnection: Disabled read interest while writing response SCEP@265602e5{l(/192.168.151.44:50082)<->r(/192.168.151.44:4400),s=1,open=true,ishut=true,oshut=false,rb=false,wb=false,w=true,i=0r}-{AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=0,l=0,c=-3},r=1} 15/06/30 12:07:08 DEBUG ChannelEndPoint: close SCEP@265602e5{l(/192.168.151.44:50082)<->r(/192.168.151.44:4400),s=1,open=true,ishut=true,oshut=false,rb=false,wb=false,w=true,i=0r}-{AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=0,l=0,c=-3},r=1} 15/06/30 12:07:08 DEBUG nio: destroyEndPoint SCEP@265602e5{l(null)<->r(/192.168.151.44:4400),s=0,open=false,ishut=true,oshut=true,rb=false,wb=false,w=true,i=0!}-{AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=0,l=0,c=-3},r=1} 15/06/30 12:07:08 DEBUG AbstractHttpConnection: closed AsyncHttpConnection@58d81c0c,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=0,l=0,c=-3},r=1 15/06/30 12:07:12 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:12 INFO TaskManager: Normal priority queue contains task: ct:1435646228657:0:long_running_job: 15/06/30 12:07:12 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:12 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:12 INFO MesosJobFramework: Launching tasks from offer: id { value: "20150626-115140-748136640-5050-46288-O174256" } framework_id { value: "20150506-160505-748136640-5050-1733-0000" } slaveid { value: "20150618-122417-748136640-5050-6138-S0" } hostname: "192.168.151.42" resources { name: "cpus" type: SCALAR scalar { value: 4.0 } role: "" } resources { name: "mem" type: SCALAR scalar { value: 6911.0 } role: "" } resources { name: "disk" type: SCALAR scalar { value: 88619.0 } role: "" } resources { name: "ports" type: RANGES ranges { range { begin: 31000 end: 32000 } } role: "_" } with tasks: ArrayBuffer(name: "ChronosTask:long_running_job" task_id { value: "ct:1435646228657:0:long_running_job:" } slaveid { value: "20150618-122417-748136640-5050-6138-S0" } resources { name: "cpus" type: SCALAR scalar { value: 0.1 } role: "" } resources { name: "mem" type: SCALAR scalar { value: 128.0 } role: "_" } resources { name: "disk" type: SCALAR scalar { value: 256.0 } role: "*" } command { environment { variables { name: "mesos_task_id" value: "ct:1435646228657:0:long_running_job:" } variables { name: "CHRONOS_JOB_OWNER" value: "xyz@abc.com" } variables { name: "CHRONOS_JOB_NAME" value: "long_running_job" } variables { name: "HOST" value: "192.168.151.42" } variables { name: "CHRONOS_RESOURCE_MEM" value: "128.0" } variables { name: "CHRONOS_RESOURCE_CPU" value: "0.1" } variables { name: "CHRONOS_RESOURCE_DISK" value: "256.0" } } value: "sleep 40" user: "root" shell: true } ) 15/06/30 12:07:12 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646228657:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:07:12 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 11,3 replyHeader:: 11,541006,0 request:: '/chronos/state/state/T_ct:1435646228657:0:long_running_job:,F response:: s{540980,540980,1435646228687,1435646228687,0,0,0,0,598,0,540980} 15/06/30 12:07:12 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 12,2 replyHeader:: 12,541007,0 request:: '/chronos/state/state/T_ct:1435646228657:0:long_running_job:,-1 response:: null 15/06/30 12:07:12 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 13,3 replyHeader:: 13,541007,-101 request:: '/chronos/state/state/T_ct:1435646228657:0:long_running_job:,F response::
15/06/30 12:07:12 INFO MesosJobFramework: Task 'ct:1435646228657:0:long_running_job:' launched, status: 'DRIVER_RUNNING' 15/06/30 12:07:12 INFO JobsObserver$: JobNotificationObserver does not handle JobStarted(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),task_id { value: "ct:1435646228657:0:long_running_job:" } state: TASK_RUNNING slave_id { value: "20150618-122417-748136640-5050-6138-S0" } timestamp: 1.435646239369082E9 source: SOURCE_EXECUTOR uuid: "m\r\206\200\242\225J\274\251\266\314w\347\23388" ,0) 15/06/30 12:07:12 INFO JobStats: Updating state for job (long_running_job) to running 15/06/30 12:07:12 INFO MesosJobFramework: Task with id 'ct:1435646228657:0:long_running_job:' RUNNING. 15/06/30 12:07:18 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:18 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:18 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:18 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:07:18 INFO TaskManager: Removing task mapping 15/06/30 12:07:18 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646238658:0:long_running_job:,0) 15/06/30 12:07:24 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:24 INFO TaskManager: Normal priority queue contains task: ct:1435646238658:0:long_running_job: 15/06/30 12:07:24 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646238658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:07:24 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 14,3 replyHeader:: 14,541007,0 request:: '/chronos/state/state/T_ct:1435646238658:0:long_running_job:,F response:: s{540985,540985,1435646228745,1435646228745,0,0,0,0,621,0,540985} 15/06/30 12:07:24 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 1ms 15/06/30 12:07:24 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 15,2 replyHeader:: 15,541008,0 request:: '/chronos/state/state/T_ct:1435646238658:0:long_running_job:,-1 response:: null 15/06/30 12:07:24 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 16,3 replyHeader:: 16,541008,-101 request:: '/chronos/state/state/T_ct:1435646238658:0:long_running_job:,F response::
15/06/30 12:07:24 WARN MesosJobFramework: The head of the task queue appears to already be running: long_running_job

15/06/30 12:07:24 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:24 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:28 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:07:28 INFO TaskManager: Removing task mapping 15/06/30 12:07:28 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646248658:0:long_running_job:,0) 15/06/30 12:07:30 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:30 INFO TaskManager: Normal priority queue contains task: ct:1435646248658:0:long_running_job: 15/06/30 12:07:30 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646248658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:07:30 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 17,3 replyHeader:: 17,541008,0 request:: '/chronos/state/state/T_ct:1435646248658:0:long_running_job:,F response:: s{540990,540990,1435646228795,1435646228795,0,0,0,0,621,0,540990} 15/06/30 12:07:30 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 18,2 replyHeader:: 18,541009,0 request:: '/chronos/state/state/T_ct:1435646248658:0:long_running_job:,-1 response:: null 15/06/30 12:07:30 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 19,3 replyHeader:: 19,541009,-101 request:: '/chronos/state/state/T_ct:1435646248658:0:long_running_job:,F response::
15/06/30 12:07:30 WARN MesosJobFramework: The head of the task queue appears to already be running: long_running_job

15/06/30 12:07:30 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:30 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:36 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:36 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:36 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:38 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:07:38 INFO TaskManager: Removing task mapping 15/06/30 12:07:38 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646258658:0:long_running_job:,0) 15/06/30 12:07:42 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:42 INFO TaskManager: Normal priority queue contains task: ct:1435646258658:0:long_running_job: 15/06/30 12:07:42 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646258658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:07:42 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 20,3 replyHeader:: 20,541009,0 request:: '/chronos/state/state/T_ct:1435646258658:0:long_running_job:,F response:: s{540995,540995,1435646228862,1435646228862,0,0,0,0,621,0,540995} 15/06/30 12:07:42 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 1ms 15/06/30 12:07:42 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 21,2 replyHeader:: 21,541010,0 request:: '/chronos/state/state/T_ct:1435646258658:0:long_running_job:,-1 response:: null 15/06/30 12:07:42 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 22,3 replyHeader:: 22,541010,-101 request:: '/chronos/state/state/T_ct:1435646258658:0:long_running_job:,F response::
15/06/30 12:07:42 WARN MesosJobFramework: The head of the task queue appears to already be running: long_running_job

15/06/30 12:07:42 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:42 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:48 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:48 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:48 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:48 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:07:48 INFO TaskManager: Removing task mapping 15/06/30 12:07:48 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646268658:0:long_running_job:,0) 15/06/30 12:07:52 INFO MesosJobFramework: Task with id 'ct:1435646228657:0:long_running_job:' FINISHED 15/06/30 12:07:52 INFO JobsObserver$: JobNotificationObserver does not handle JobFinished(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,0,0,,,2,xyz@abc.com,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),task_id { value: "ct:1435646228657:0:long_running_job:" } state: TASK_FINISHED message: "Command exited with status 0" slave_id { value: "20150618-122417-748136640-5050-6138-S0" } timestamp: 1.435646279394609E9 source: SOURCE_EXECUTOR uuid: "\354A\236ZB\021D\033\245\251\n\231\373\203\233J" ,0) 15/06/30 12:07:52 INFO JobStats: Updating state for job (long_running_job) to idle 15/06/30 12:07:52 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:07:52 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:07:52 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:07:52 INFO JobGraph: Dependents: [] 15/06/30 12:07:53 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:53 INFO TaskManager: Normal priority queue contains task: ct:1435646268658:0:long_running_job: 15/06/30 12:07:53 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:53 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:53 INFO MesosJobFramework: Launching tasks from offer: id { value: "20150626-115140-748136640-5050-46288-O174275" } framework_id { value: "20150506-160505-748136640-5050-1733-0000" } slaveid { value: "20150618-122417-748136640-5050-6138-S0" } hostname: "192.168.151.42" resources { name: "cpus" type: SCALAR scalar { value: 4.0 } role: "" } resources { name: "mem" type: SCALAR scalar { value: 6911.0 } role: "" } resources { name: "disk" type: SCALAR scalar { value: 88619.0 } role: "" } resources { name: "ports" type: RANGES ranges { range { begin: 31000 end: 32000 } } role: "_" } with tasks: ArrayBuffer(name: "ChronosTask:long_running_job" task_id { value: "ct:1435646268658:0:long_running_job:" } slaveid { value: "20150618-122417-748136640-5050-6138-S0" } resources { name: "cpus" type: SCALAR scalar { value: 0.1 } role: "" } resources { name: "mem" type: SCALAR scalar { value: 128.0 } role: "_" } resources { name: "disk" type: SCALAR scalar { value: 256.0 } role: "*" } command { environment { variables { name: "mesos_task_id" value: "ct:1435646268658:0:long_running_job:" } variables { name: "CHRONOS_JOB_OWNER" value: "xyz@abc.com" } variables { name: "CHRONOS_JOB_NAME" value: "long_running_job" } variables { name: "HOST" value: "192.168.151.42" } variables { name: "CHRONOS_RESOURCE_MEM" value: "128.0" } variables { name: "CHRONOS_RESOURCE_CPU" value: "0.1" } variables { name: "CHRONOS_RESOURCE_DISK" value: "256.0" } } value: "sleep 40" user: "root" shell: true } ) 15/06/30 12:07:53 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646268658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:07:53 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 23,3 replyHeader:: 23,541011,0 request:: '/chronos/state/state/T_ct:1435646268658:0:long_running_job:,F response:: s{541000,541000,1435646228903,1435646228903,0,0,0,0,621,0,541000} 15/06/30 12:07:53 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 1ms 15/06/30 12:07:53 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 24,2 replyHeader:: 24,541012,0 request:: '/chronos/state/state/T_ct:1435646268658:0:long_running_job:,-1 response:: null 15/06/30 12:07:53 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 25,3 replyHeader:: 25,541012,-101 request:: '/chronos/state/state/T_ct:1435646268658:0:long_running_job:,F response::
15/06/30 12:07:53 INFO MesosJobFramework: Task 'ct:1435646268658:0:long_running_job:' launched, status: 'DRIVER_RUNNING' 15/06/30 12:07:53 INFO JobsObserver$: JobNotificationObserver does not handle JobStarted(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),task_id { value: "ct:1435646268658:0:long_running_job:" } state: TASK_RUNNING slave_id { value: "20150618-122417-748136640-5050-6138-S0" } timestamp: 1.435646280388869E9 source: SOURCE_EXECUTOR uuid: "=9\357\266\377NL\357\243\275d\000\326a\256F" ,0) 15/06/30 12:07:53 INFO JobStats: Updating state for job (long_running_job) to running 15/06/30 12:07:53 INFO MesosJobFramework: Task with id 'ct:1435646268658:0:long_running_job:' RUNNING. 15/06/30 12:07:54 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:54 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:54 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:07:58 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:07:58 INFO TaskManager: Removing task mapping 15/06/30 12:07:58 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R4/2015-06-30T06:38:08.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646278658:0:long_running_job:,0) 15/06/30 12:07:59 INFO MesosJobFramework: Received resource offers

15/06/30 12:07:59 INFO TaskManager: Normal priority queue contains task: ct:1435646278658:0:long_running_job: 15/06/30 12:07:59 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646278658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:07:59 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 26,3 replyHeader:: 26,541012,0 request:: '/chronos/state/state/T_ct:1435646278658:0:long_running_job:,F response:: s{541005,541005,1435646228962,1435646228962,0,0,0,0,621,0,541005} 15/06/30 12:07:59 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 27,2 replyHeader:: 27,541013,0 request:: '/chronos/state/state/T_ct:1435646278658:0:long_running_job:,-1 response:: null 15/06/30 12:07:59 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 28,3 replyHeader:: 28,541013,-101 request:: '/chronos/state/state/T_ct:1435646278658:0:long_running_job:,F response::
15/06/30 12:07:59 WARN MesosJobFramework: The head of the task queue appears to already be running: long_running_job

15/06/30 12:07:59 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:07:59 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:00 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:00 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:00 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:00 INFO JobScheduler: Size of streams: 1 15/06/30 12:08:00 INFO JobScheduler: Checking schedules with time horizon:PT60S 15/06/30 12:08:00 INFO JobScheduler: Calling next for stream: R4/2015-06-30T06:38:08.658Z/PT10S, jobname: long_running_job 15/06/30 12:08:00 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:38:08.658Z 15/06/30 12:08:00 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:08:00 INFO TaskManager: Scheduling task 'ct:1435646288658:0:long_running_job:' with delay: '7772' 15/06/30 12:08:00 INFO MesosStatePersistenceStore: State T_ct:1435646288658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:08:00 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:00 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R3/2015-06-30T06:38:18.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:08:00 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R3/2015-06-30T06:38:18.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:08:00 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:08:00 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:00 INFO JobScheduler: Calling next for stream: R3/2015-06-30T06:38:18.658Z/PT10S, jobname: long_running_job 15/06/30 12:08:00 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:38:18.658Z 15/06/30 12:08:00 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:08:00 INFO TaskManager: Scheduling task 'ct:1435646298658:0:long_running_job:' with delay: '17704' 15/06/30 12:08:00 INFO MesosStatePersistenceStore: State T_ct:1435646298658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:08:00 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:00 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R2/2015-06-30T06:38:28.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:08:00 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R2/2015-06-30T06:38:28.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:08:00 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:08:01 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:01 INFO JobScheduler: Calling next for stream: R2/2015-06-30T06:38:28.658Z/PT10S, jobname: long_running_job 15/06/30 12:08:01 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:38:28.658Z 15/06/30 12:08:01 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:08:01 INFO TaskManager: Scheduling task 'ct:1435646308658:0:long_running_job:' with delay: '27654' 15/06/30 12:08:01 INFO MesosStatePersistenceStore: State T_ct:1435646308658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:08:01 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:01 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R1/2015-06-30T06:38:38.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:08:01 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R1/2015-06-30T06:38:38.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:08:01 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:08:01 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:01 INFO JobScheduler: Calling next for stream: R1/2015-06-30T06:38:38.658Z/PT10S, jobname: long_running_job 15/06/30 12:08:01 INFO JobScheduler: Task ready for scheduling: 2015-06-30T06:38:38.658Z 15/06/30 12:08:01 INFO JobScheduler: Scheduling:long_running_job 15/06/30 12:08:01 INFO TaskManager: Scheduling task 'ct:1435646318658:0:long_running_job:' with delay: '37487' 15/06/30 12:08:01 INFO MesosStatePersistenceStore: State T_ct:1435646318658:0:long_running_job: does not exist yet. Adding to state 15/06/30 12:08:01 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:01 INFO JobScheduler: Saving updated job:ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()) 15/06/30 12:08:01 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:08:01 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:08:01 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:01 INFO JobScheduler: Calling next for stream: R0/2015-06-30T06:38:48.658Z/PT10S, jobname: long_running_job 15/06/30 12:08:01 INFO JobScheduler: Finished all recurrences of job 'long_running_job' 15/06/30 12:08:05 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:05 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:05 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:06 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:06 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:06 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:08 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:08:08 INFO TaskManager: Removing task mapping 15/06/30 12:08:08 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646288658:0:long_running_job:,0) 15/06/30 12:08:11 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:11 INFO TaskManager: Normal priority queue contains task: ct:1435646288658:0:long_running_job: 15/06/30 12:08:11 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646288658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:08:11 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 29,3 replyHeader:: 29,541033,0 request:: '/chronos/state/state/T_ct:1435646288658:0:long_running_job:,F response:: s{541017,541017,1435646280919,1435646280919,0,0,0,0,645,0,541017} 15/06/30 12:08:11 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 1ms 15/06/30 12:08:11 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 30,2 replyHeader:: 30,541034,0 request:: '/chronos/state/state/T_ct:1435646288658:0:long_running_job:,-1 response:: null 15/06/30 12:08:11 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 31,3 replyHeader:: 31,541034,-101 request:: '/chronos/state/state/T_ct:1435646288658:0:long_running_job:,F response::
15/06/30 12:08:11 WARN MesosJobFramework: The head of the task queue appears to already be running: long_running_job

15/06/30 12:08:11 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:11 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:12 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:12 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:12 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:17 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:17 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:17 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:18 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:18 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:18 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:18 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:08:18 INFO TaskManager: Removing task mapping 15/06/30 12:08:18 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646298658:0:long_running_job:,0) 15/06/30 12:08:23 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:23 INFO TaskManager: Normal priority queue contains task: ct:1435646298658:0:long_running_job: 15/06/30 12:08:23 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646298658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:08:23 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 32,3 replyHeader:: 32,541034,0 request:: '/chronos/state/state/T_ct:1435646298658:0:long_running_job:,F response:: s{541022,541022,1435646280986,1435646280986,0,0,0,0,645,0,541022} 15/06/30 12:08:23 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 1ms 15/06/30 12:08:23 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 33,2 replyHeader:: 33,541035,0 request:: '/chronos/state/state/T_ct:1435646298658:0:long_running_job:,-1 response:: null 15/06/30 12:08:23 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 34,3 replyHeader:: 34,541035,-101 request:: '/chronos/state/state/T_ct:1435646298658:0:long_running_job:,F response::
15/06/30 12:08:23 WARN MesosJobFramework: The head of the task queue appears to already be running: long_running_job

15/06/30 12:08:23 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:23 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:24 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:24 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:24 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:28 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:08:28 INFO TaskManager: Removing task mapping 15/06/30 12:08:28 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646308658:0:long_running_job:,0) 15/06/30 12:08:29 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:29 INFO TaskManager: Normal priority queue contains task: ct:1435646308658:0:long_running_job: 15/06/30 12:08:29 INFO MesosStatePersistenceStore: Purging entry 'T_ct:1435646308658:0:long_running_job:' via: org.apache.mesos.state.ZooKeeperState 15/06/30 12:08:29 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 35,3 replyHeader:: 35,541035,0 request:: '/chronos/state/state/T_ct:1435646308658:0:long_running_job:,F response:: s{541027,541027,1435646281027,1435646281027,0,0,0,0,645,0,541027} 15/06/30 12:08:29 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 36,2 replyHeader:: 36,541036,0 request:: '/chronos/state/state/T_ct:1435646308658:0:long_running_job:,-1 response:: null 15/06/30 12:08:29 DEBUG ClientCnxn: Reading reply sessionid:0x14e19dd24fb00e7, packet:: clientPath:null serverPath:null finished:false header:: 37,3 replyHeader:: 37,541036,-101 request:: '/chronos/state/state/T_ct:1435646308658:0:long_running_job:,F response::
15/06/30 12:08:29 WARN MesosJobFramework: The head of the task queue appears to already be running: long_running_job

15/06/30 12:08:29 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:29 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:30 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:30 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:30 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:33 INFO MesosJobFramework: Task with id 'ct:1435646268658:0:long_running_job:' FINISHED 15/06/30 12:08:33 INFO JobsObserver$: JobNotificationObserver does not handle JobFinished(ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),task_id { value: "ct:1435646268658:0:long_running_job:" } state: TASK_FINISHED message: "Command exited with status 0" slave_id { value: "20150618-122417-748136640-5050-6138-S0" } timestamp: 1.435646320414338E9 source: SOURCE_EXECUTOR uuid: "2\221]\347\375\276K\224\255r\f\316\341\2134\264" ,0) 15/06/30 12:08:33 INFO JobStats: Updating state for job (long_running_job) to idle 15/06/30 12:08:33 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,2,0,,,2,xyz@abc.com,,,2015-06-30T06:38:33.610Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:08:33 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:08:33 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:33 INFO JobGraph: Dependents: [] 15/06/30 12:08:33 INFO JobScheduler: Disabling job that reached a zero-recurrence count! 15/06/30 12:08:33 INFO JobNotificationObserver: [Chronos] job 'long_running_job' disabled 15/06/30 12:08:33 INFO JobsObserver$: JobStats does not handle JobDisabled(ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,1,0,,,2,xyz@abc.com,,,2015-06-30T06:37:52.605Z,,false,0.1,256.0,128.0,false,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),Job 'long_running_job' has exhausted all of its recurrences and has been disabled. Please consider either removing your job, or updating its schedule and re-enabling it. ) 15/06/30 12:08:33 INFO MesosStatePersistenceStore: Persisting job 'long_running_job' with data 'ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,2,0,,,2,xyz@abc.com,,,2015-06-30T06:38:33.610Z,,false,0.1,256.0,128.0,true,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer())' 15/06/30 12:08:33 INFO MesosStatePersistenceStore: Key for state exists already: J_long_running_job 15/06/30 12:08:33 INFO MesosStatePersistenceStore: State update successful: true 15/06/30 12:08:34 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:34 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:34 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:36 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:36 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:36 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:38 INFO TaskManager: Triggering: 'long_running_job' 15/06/30 12:08:38 INFO TaskManager: Removing task mapping 15/06/30 12:08:38 INFO JobsObserver$: JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,2,0,,,2,xyz@abc.com,,,2015-06-30T06:38:33.610Z,,false,0.1,256.0,128.0,true,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646318658:0:long_running_job:,0) 15/06/30 12:08:38 INFO JobStats: Updating state for job (long_running_job) to queued 15/06/30 12:08:40 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:40 INFO TaskManager: Normal priority queue contains task: ct:1435646318658:0:long_running_job: 15/06/30 12:08:40 INFO JobsObserver$: JobNotificationObserver does not handle JobExpired(ScheduleBasedJob(R0/2015-06-30T06:38:48.658Z/PT10S,long_running_job,sleep 40,PT5M,2,0,,,2,xyz@abc.com,,,2015-06-30T06:38:33.610Z,,false,0.1,256.0,128.0,true,0,ListBuffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),ct:1435646318658:0:long_running_job:) 15/06/30 12:08:40 INFO JobStats: Updating state for job (long_running_job) to idle 15/06/30 12:08:40 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:40 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:42 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:42 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:42 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:42 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 0ms 15/06/30 12:08:46 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:46 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:46 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:48 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:48 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:48 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:52 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:52 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:52 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:54 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:54 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:54 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:08:56 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 0ms 15/06/30 12:08:58 INFO MesosJobFramework: Received resource offers

15/06/30 12:08:58 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:08:58 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:09:00 INFO MesosJobFramework: Received resource offers

15/06/30 12:09:00 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:09:00 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:09:01 INFO JobScheduler: Size of streams: 0 15/06/30 12:09:01 INFO JobScheduler: Checking schedules with time horizon:PT60S 15/06/30 12:09:04 INFO MesosJobFramework: Received resource offers

15/06/30 12:09:04 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:09:04 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:09:06 INFO MesosJobFramework: Received resource offers

15/06/30 12:09:06 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:09:06 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:09:09 DEBUG ClientCnxn: Got ping response for sessionid: 0x14e19dd24fb00e7 after 0ms 15/06/30 12:09:10 INFO MesosJobFramework: Received resource offers

15/06/30 12:09:10 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:09:10 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:09:12 INFO MesosJobFramework: Received resource offers

15/06/30 12:09:12 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:09:12 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:09:15 INFO MesosJobFramework: Received resource offers

15/06/30 12:09:15 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:09:15 INFO MesosJobFramework: Declining unused offers.

15/06/30 12:09:18 INFO MesosJobFramework: Received resource offers

15/06/30 12:09:18 INFO MesosJobFramework: No tasks scheduled or next task has been disabled.

15/06/30 12:09:18 INFO MesosJobFramework: Declining unused offers.

depay commented 9 years ago
    def generate() {
      taskManager.getTask match {
        case None => log.info("No tasks scheduled or next task has been disabled.\n")
        case Some((taskId, job)) =>
          if (runningTasks.contains(job.name)) {
            val deleted = taskManager.removeTask(taskId)
            log.warning("The head of the task queue appears to already be running: " + job.name + "\n")
            generate()
          } else {

we have change these codes in our local version, using taskId instead of job name..LoL

@elingg @aquamatthias do you know why should chronos prevents a job from running if the previous one doesn't finish?

elingg commented 9 years ago

@depay, Chronos by default only allows one instance of a job to run at a time.

depay commented 9 years ago

@elingg thanks but things will go bad if any task hunging.. @pankaj36 it's clear now that your job is too frequent...

kolloch commented 9 years ago

Should we close that issue, then? If you disagree, please elaborate and reopen.