Closed bentrigat closed 11 years ago
submitting similar job manually:
curl -L -X GET localhost:4400/scheduler/jobs [{"schedule":"R/2013-05-02T22:52:30Z/PT24H","name":"testjob14","command":"ls -aRtl /tmp >> /tmp/ben/f","epsilon":"PT15M","successCount":0,"errorCount":0,"executor":"","executorFlags":"","retries":2,"owner":"ben@toto.com","lastSuccess":"","lastError":"","async":false}]
curl -L -X PUT localhost:4400/scheduler/job/testjob14
generates this in the log :
INFO [2013-05-02 22:54:43,543] com.airbnb.scheduler.api.JobManagementResource: Manually triggering job:testjob14 DEBUG [2013-05-02 22:54:43,543] com.airbnb.scheduler.jobs.TaskManager: Adding task 'ct:testjob14:1367535283543:0' to queue 127.0.0.1 - - [02/May/2013:22:54:43 +0000] "PUT /scheduler/job/testjob14 HTTP/1.1" 204 0 3 3 DEBUG [2013-05-02 22:54:43,544] org.eclipse.jetty.server.Server: RESPONSE /scheduler/job/testjob14 204 handled=true DEBUG [2013-05-02 22:54:43,547] org.eclipse.jetty.io.nio.ChannelEndPoint: ishut BCEP@138f4128{l(/127.0.0.1:50128)<->r(/127.0.0.1:4400),open=true,ishut=false,oshut=false}-{BlockingHttpConnection@76949cfe,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=-3},r=1} DEBUG [2013-05-02 22:54:43,547] org.eclipse.jetty.http.HttpParser: filled -1/0 DEBUG [2013-05-02 22:54:43,547] org.eclipse.jetty.io.nio.ChannelEndPoint: close BCEP@138f4128{l(/127.0.0.1:50128)<->r(/127.0.0.1:4400),open=true,ishut=true,oshut=false}-{BlockingHttpConnection@76949cfe,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=0,l=0,c=-3},r=1} DEBUG [2013-05-02 22:54:43,555] org.eclipse.jetty.server.AbstractHttpConnection: closed BlockingHttpConnection@76949cfe,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=0,l=0,c=-3},r=1 DEBUG [2013-05-02 22:54:44,523] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:46,192] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:47,860] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:49,527] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:51,196] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:52,862] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:53,337] org.eclipse.jetty.http.HttpParser: filled 1262/1262 DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.server.Server: REQUEST /scheduler/jobs on BlockingHttpConnection@55eb75d3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-5,l=617,c=0},r=46 DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/jobs @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/jobs @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.server.handler.ContextHandler: context=||/scheduler/jobs @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.servlet.ServletHandler: servlet |/scheduler|/jobs -> com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.servlet.ServletHandler: chain=com.yammer.dropwizard.servlets.ThreadNameFilter-664071981->com.airbnb.scheduler.api.RedirectFilter-1583883651->com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.servlet.ServletHandler: call filter com.yammer.dropwizard.servlets.ThreadNameFilter-664071981 DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.servlet.ServletHandler: call filter com.airbnb.scheduler.api.RedirectFilter-1583883651 DEBUG [2013-05-02 22:54:53,338] org.eclipse.jetty.servlet.ServletHandler: call servlet com.sun.jersey.spi.container.servlet.ServletContainer-723468318 127.0.0.1 - - [02/May/2013:22:54:53 +0000] "GET /scheduler/jobs HTTP/1.1" 200 218 5 5 DEBUG [2013-05-02 22:54:53,342] org.eclipse.jetty.server.Server: RESPONSE /scheduler/jobs 200 handled=true DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.http.HttpParser: filled 1244/1244 DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.server.Server: REQUEST /scheduler/graph/csv on BlockingHttpConnection@4db991a8,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-5,l=617,c=0},r=144 DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/graph/csv @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/graph/csv @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.server.handler.ContextHandler: context=||/scheduler/graph/csv @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.servlet.ServletHandler: servlet |/scheduler|/graph/csv -> com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.servlet.ServletHandler: chain=com.yammer.dropwizard.servlets.ThreadNameFilter-664071981->com.airbnb.scheduler.api.RedirectFilter-1583883651->com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:54:53,413] org.eclipse.jetty.servlet.ServletHandler: call filter com.yammer.dropwizard.servlets.ThreadNameFilter-664071981 DEBUG [2013-05-02 22:54:53,414] org.eclipse.jetty.servlet.ServletHandler: call filter com.airbnb.scheduler.api.RedirectFilter-1583883651 DEBUG [2013-05-02 22:54:53,414] org.eclipse.jetty.servlet.ServletHandler: call servlet com.sun.jersey.spi.container.servlet.ServletContainer-723468318 127.0.0.1 - - [02/May/2013:22:54:53 +0000] "GET /scheduler/graph/csv HTTP/1.1" 200 41 3 3 DEBUG [2013-05-02 22:54:53,417] org.eclipse.jetty.server.Server: RESPONSE /scheduler/graph/csv 200 handled=true DEBUG [2013-05-02 22:54:54,530] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:56,197] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:57,866] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:54:59,533] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:55:01,202] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:55:02,869] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:55:03,339] org.eclipse.jetty.http.HttpParser: filled 1262/1262 DEBUG [2013-05-02 22:55:03,340] org.eclipse.jetty.server.Server: REQUEST /scheduler/jobs on BlockingHttpConnection@4db991a8,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-5,l=617,c=0},r=145 DEBUG [2013-05-02 22:55:03,340] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/jobs @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:55:03,340] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/jobs @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:55:03,340] org.eclipse.jetty.server.handler.ContextHandler: context=||/scheduler/jobs @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:55:03,341] org.eclipse.jetty.servlet.ServletHandler: servlet |/scheduler|/jobs -> com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:55:03,341] org.eclipse.jetty.servlet.ServletHandler: chain=com.yammer.dropwizard.servlets.ThreadNameFilter-664071981->com.airbnb.scheduler.api.RedirectFilter-1583883651->com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:55:03,341] org.eclipse.jetty.servlet.ServletHandler: call filter com.yammer.dropwizard.servlets.ThreadNameFilter-664071981 DEBUG [2013-05-02 22:55:03,341] org.eclipse.jetty.servlet.ServletHandler: call filter com.airbnb.scheduler.api.RedirectFilter-1583883651 DEBUG [2013-05-02 22:55:03,341] org.eclipse.jetty.servlet.ServletHandler: call servlet com.sun.jersey.spi.container.servlet.ServletContainer-723468318 127.0.0.1 - - [02/May/2013:22:55:03 +0000] "GET /scheduler/jobs HTTP/1.1" 200 218 6 6 DEBUG [2013-05-02 22:55:03,345] org.eclipse.jetty.server.Server: RESPONSE /scheduler/jobs 200 handled=true DEBUG [2013-05-02 22:55:03,417] org.eclipse.jetty.http.HttpParser: filled 1244/1244 DEBUG [2013-05-02 22:55:03,417] org.eclipse.jetty.server.Server: REQUEST /scheduler/graph/csv on BlockingHttpConnection@55eb75d3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-5,l=617,c=0},r=47 DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/graph/csv @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.server.handler.ContextHandler: scope null||/scheduler/graph/csv @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.server.handler.ContextHandler: context=||/scheduler/graph/csv @ o.e.j.s.ServletContextHandler{/,null} DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.servlet.ServletHandler: servlet |/scheduler|/graph/csv -> com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.servlet.ServletHandler: chain=com.yammer.dropwizard.servlets.ThreadNameFilter-664071981->com.airbnb.scheduler.api.RedirectFilter-1583883651->com.sun.jersey.spi.container.servlet.ServletContainer-723468318 DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.servlet.ServletHandler: call filter com.yammer.dropwizard.servlets.ThreadNameFilter-664071981 DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.servlet.ServletHandler: call filter com.airbnb.scheduler.api.RedirectFilter-1583883651 DEBUG [2013-05-02 22:55:03,418] org.eclipse.jetty.servlet.ServletHandler: call servlet com.sun.jersey.spi.container.servlet.ServletContainer-723468318 127.0.0.1 - - [02/May/2013:22:55:03 +0000] "GET /scheduler/graph/csv HTTP/1.1" 200 41 3 3 DEBUG [2013-05-02 22:55:03,420] org.eclipse.jetty.server.Server: RESPONSE /scheduler/graph/csv 200 handled=true DEBUG [2013-05-02 22:55:04,538] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:55:06,206] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:55:07,874] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:55:09,541] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms DEBUG [2013-05-02 22:55:11,209] org.apache.zookeeper.ClientCnxn: Got ping response for sessionid: 0x13e674791160002 after 0ms
anybody had similar issue ? how else would i troubleshoot this ?
@bentrigat: How do you start chronos on your cluster? What parameters do you pass it? Do you include the dw.hostname
parameter, similar to the example run
script?
Additionally, is your provided chronos conf the one you attempt to boot your cluster with? If so, the zookeeperServers
parameter is incorrect for a cluster configuration, as they'd each be referencing a different, embedded zookeeper.
thx @andykram .
i start the chronos with :+1: MESOS_LAUNCHER_DIR=/toto/home/bensix/FromFtp/repo/mesos/build/src/ MESOS_KILLTREE=/toto/home/bensix/FromFtp/repo/mesos/src/scripts/killtree.sh MESOS_NATIVE_LIBRARY=/toto/mesos6/lib/libmesos.so MESOS_NATIVE_LIBRARY=/toto/mesos6/lib/libmesos.so JAVA_HOME=/usr/lib/jvm/java-7-oracle java -cp /toto/home/bensix/FromFtp/repo/chronos/target/chronos-1.0-SNAPSHOT.jar com.airbnb.scheduler.Main server /toto/home/bensix/FromFtp/repo/chronos/config/toto_cluster_scheduler.yml > /tmp/ben/l 2>&1 &
what's dw.hostname for ?
yes the chronos conf i mentioned is the one that fails . If we should not use zookeeperServers when when we use another zk server than the embedded one , then what's the proper parameter and syntax for such , please ?
@bentrigat On second glance, it appears that chronos is not receiving any receive offers from mesos. Are you running a mesos-master and a mesos-slave connected to the same zookeeper?
yes i am , that was my goal.
so here is my current config : just 2 servers now:
on the zookeeper id 1 + mesos master + chonos / 64.106.186.37 zookeeper conf snippet:
server.1=64.106.186.37:2888:3888 server.2=64.106.186.58:2888:3888
chronos config snippet :
scheduleHorizonSeconds: 60 master: zk://64.106.186.37:2181/mesos zookeeperServers: 64.106.186.37:2181 zookeeperStateZnode: "/airbnb/service/chronos/state" zookeeperLeaderZnode: "/airbnb/service/chronos/leader" zookeeperCandidateZnode: "/airbnb/service/chronos/candidate"
mesos conf is : zk=zk://64.106.186.37:2181/toto master=64.106.186.37:5050
on the zookeeper id 2 + mesos slave / 64.106.186.58 mesos conf is :
zk=zk://64.106.186.37:2181/toto master=64.106.186.37:5050
both have :
cat mesos/deploy/masters 64.106.186.37
cat mesos/deploy/slaves 64.106.186.58
Ah, I see what's going on here.
In your chronos configuration file you specify:
master: zk://64.106.186.37:2181/mesos
Which tells chronos that mesos will live in /mesos
. However, in your mesos configuration files you specify:
zk=zk://64.106.186.37:2181/toto
which will tell mesos live in /toto
. As such, mesos will never see chronos and vice versa.
To fix this, either:
/mesos
or
/toto
Additionally, it looks like you're only specifying one zookeeper server to both chronos and mesos. This is sufficient to work, but not a recommended configuration. To specify multiple zookeepers, use the following syntax for both chronos's master
parameter and mesos's zk
parameter:
zk://64.106.186.37:2181,64.106.186.58:2181/PATH
and for chronos's zookeeperServers
parameter use:
64.106.186.37:2181,64.106.186.58:2181
ok that work snow. thanks ! can i point out that it would be nice to have some kind of notification when the job is not scheduled ? i think whatever the reason is the user would like to be aware the job wont run because it cant be scheduled. thanks again,
@bentrigat glad to hear your issue has been resolved! We're looking into adding reporting/metrics for cases such as not receiving resource offers, the task queue growing too large/quickly, etc.
hello , am able to run 1 simple job (simple ls) when i use the provided local_scheduler_nozk.yml but when i switch to a 2 nodes cluster config (1 node with mesos master+chronos+zookeeper and 1 node as a simple mesos slave) the job is registered but never run. no trace of running in the logs (although using TRACE level) and no output for my job on any of the 2 nodes cluster , the log dont show launching of the job nor TASK_RUNNING or TASK_FAILURE i expected either. see gist for details about the job 'test7'. note i see the job in the chronos ui and it does not show either success nor error. also i made sure to use UTC timestamp ;) i also see the mesos master and slave in the mesos ui. /tmp/mesos/slave on the mesos-master/chronos server is empty the mesos tests at https://github.com/apache/mesos/blob/trunk/docs/Home.md all pass (java/python/c++). tried to submit from both chronos web ui and the chronos/curl cli to no avail.
please see https://gist.github.com/bentrigat/5503859