grisu / gricli

Grisu commandline client
7 stars 2 forks source link

Job submission failures - server down #113

Closed vladimir-mencl-eresearch closed 13 years ago

vladimir-mencl-eresearch commented 13 years ago

Hi,

I got another set of about 100 consecutive job submission failures around 20:41 yesterday (July 28, 2011).

command failed. Either connection to server failed, or this is gricli bug. Please send /home/vme28/sw-comp/gricli/.grisu.beta/gricli.debug to eresearch-admin@auckland.ac.nz together with description of what triggered the problem adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r com.sun.xml.ws.client.ClientTransportException: The server sent HTTP status code 503: Service Temporarily Unavailable at com.sun.xml.ws.transport.http.client.HttpTransportPipe.checkStatusCode(HttpTransportPipe.java:203) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.process(HttpTransportPipe.java:179) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.processRequest(HttpTransportPipe.java:94) at com.sun.xml.ws.transport.DeferredTransportPipe.processRequest(DeferredTransportPipe.java:89) at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:598) at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:557) at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:542) at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:439) at com.sun.xml.ws.client.Stub.process(Stub.java:222) at com.sun.xml.ws.client.sei.SEIStub.doProcess(SEIStub.java:135) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:109) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.createJob(Unknown Source) at grisu.frontend.model.job.JobObject.createJob(JobObject.java:460) at grisu.gricli.command.SubmitCommand.createJob(SubmitCommand.java:39) at grisu.gricli.command.SubmitCommand.execute(SubmitCommand.java:50) at grisu.gricli.Gricli.runCommand(Gricli.java:204) at grisu.gricli.Gricli.run(Gricli.java:195) at grisu.gricli.Gricli.executionLoop(Gricli.java:65) at grisu.gricli.Gricli.main(Gricli.java:179) command failed. Either connection to server failed, or this is gricli bug. Please send /home/vme28/sw-comp/gricli/.grisu.beta/gricli.debug to eresearch-admin@auckland.ac.nz together with description of what triggered the problem adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r com.sun.xml.ws.client.ClientTransportException: The server sent HTTP status code 503: Service Temporarily Unavailable at com.sun.xml.ws.transport.http.client.HttpTransportPipe.checkStatusCode(HttpTransportPipe.java:203) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.process(HttpTransportPipe.java:179) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.processRequest(HttpTransportPipe.java:94) at com.sun.xml.ws.transport.DeferredTransportPipe.processRequest(DeferredTransportPipe.java:89) at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:598) at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:557) at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:542) at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:439) at com.sun.xml.ws.client.Stub.process(Stub.java:222) at com.sun.xml.ws.client.sei.SEIStub.doProcess(SEIStub.java:135) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:109) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.createJob(Unknown Source) at grisu.frontend.model.job.JobObject.createJob(JobObject.java:460) at grisu.gricli.command.SubmitCommand.createJob(SubmitCommand.java:39) at grisu.gricli.command.SubmitCommand.execute(SubmitCommand.java:50) at grisu.gricli.Gricli.runCommand(Gricli.java:204) at grisu.gricli.Gricli.run(Gricli.java:195) at grisu.gricli.Gricli.executionLoop(Gricli.java:65) at grisu.gricli.Gricli.main(Gricli.java:179) command failed. Either connection to server failed, or this is gricli bug. Please send /home/vme28/sw-comp/gricli/.grisu.beta/gricli.debug to eresearch-admin@auckland.ac.nz together with description of what triggered the problem adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r com.sun.xml.ws.client.ClientTransportException: The server sent HTTP status code 503: Service Temporarily Unavailable at com.sun.xml.ws.transport.http.client.HttpTransportPipe.checkStatusCode(HttpTransportPipe.java:203) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.process(HttpTransportPipe.java:179) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.processRequest(HttpTransportPipe.java:94) at com.sun.xml.ws.transport.DeferredTransportPipe.processRequest(DeferredTransportPipe.java:89) at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:598) at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:557) at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:542) at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:439) at com.sun.xml.ws.client.Stub.process(Stub.java:222) at com.sun.xml.ws.client.sei.SEIStub.doProcess(SEIStub.java:135) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:109) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.createJob(Unknown Source) at grisu.frontend.model.job.JobObject.createJob(JobObject.java:460) at grisu.gricli.command.SubmitCommand.createJob(SubmitCommand.java:39) at grisu.gricli.command.SubmitCommand.execute(SubmitCommand.java:50) at grisu.gricli.Gricli.runCommand(Gricli.java:204) at grisu.gricli.Gricli.run(Gricli.java:195) at grisu.gricli.Gricli.executionLoop(Gricli.java:65) at grisu.gricli.Gricli.main(Gricli.java:179)

gricli.debug says:

19059790 [main] ERROR grisu.gricli.Gricli - com.sun.xml.ws.client.ClientTransportException: The server sent HTTP status code 503: Service Temporarily Unavailable 19059793 [main] DEBUG grisu.model.job.JobSubmissionObjectImpl - Commandline for job: R-bubb-pi075-1902 changed: R --no-readline --no-restore --no-save -f Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r 19059865 [main] ERROR grisu.gricli.Gricli - com.sun.xml.ws.client.ClientTransportException: The server sent HTTP status code 503: Service Temporarily Unavailable 19059867 [main] DEBUG grisu.model.job.JobSubmissionObjectImpl - Commandline for job: R-bubb-pi075-1903 changed: R --no-readline --no-restore --no-save -f Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r 19059938 [main] ERROR grisu.gricli.Gricli - com.sun.xml.ws.client.ClientTransportException: The server sent HTTP status code 503: Service Temporarily Unavailable

yuriyh commented 13 years ago

which server?

yuriyh commented 13 years ago

checked both prod and dev - both are fine...

makkus commented 13 years ago

Hm. I think we need to talk about where we put issues like this. I don't think runtime issues like a backend goes down necessarily belong in the github issues tracker.

Probably a bigger talk, we might need to setup a front where we can log every issue and then distribute to proper place (i.e. gricli issues, grisu issues, jira for runtime issues,...) Nick, didn't you guys start to look into something like that?

vladimir-mencl-eresearch commented 13 years ago

Hi guys, sorry about a late reply. That looks like a backend issue - which could be either with the backend code as such, or the stability of the backend deployment.

The submission was to the BeSTGRID-DEV backend ... and it was in the middle of a successful row of job submission attempts - just the backend server was for a while failing all requests with : 503: Service Temporarily Unavailable

makkus commented 13 years ago

Hm, what should we do? Any ideas? I've got none :-(

makkus commented 13 years ago

Will close for now. Please re-open if it happens again. Might be that we just rebooted the development backend while you were submitting your jobs...