grisu / gricli

Grisu commandline client
7 stars 2 forks source link

Submission issues #92

Closed vladimir-mencl-eresearch closed 12 years ago

vladimir-mencl-eresearch commented 13 years ago

Hi,

I was submitting a batch of 2000 jobs to ng2sge.canterbury.ac.nz via BeSTGRID-DEV with the current gricli-dev.jar (as of July 20th, 2011).

I got a number of errors: here is an excerpt and I'll attach a full log to the issue:

adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0085 adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r com.sun.xml.ws.client.ClientTransportException: The server sent HTTP status code 413: Request Entity Too Large at com.sun.xml.ws.transport.http.client.HttpTransportPipe.checkStatusCode(HttpTransportPipe.java:203) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.process(HttpTransportPipe.java:179) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.processRequest(HttpTransportPipe.java:94) at com.sun.xml.ws.transport.DeferredTransportPipe.processRequest(DeferredTransportPipe.java:89) at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:598) at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:557) at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:542) at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:439) at com.sun.xml.ws.client.Stub.process(Stub.java:222) at com.sun.xml.ws.client.sei.SEIStub.doProcess(SEIStub.java:135) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:109) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.createJob(Unknown Source) at grisu.frontend.model.job.JobObject.createJob(JobObject.java:460) at grisu.gricli.command.SubmitCommand.createJob(SubmitCommand.java:39) at grisu.gricli.command.SubmitCommand.execute(SubmitCommand.java:50) at grisu.gricli.Gricli.runCommand(Gricli.java:204) at grisu.gricli.Gricli.run(Gricli.java:195) at grisu.gricli.Gricli.executionLoop(Gricli.java:65) at grisu.gricli.Gricli.main(Gricli.java:179) command failed. Either connection to server failed, or this is gricli bug. Please send /home/vme28/sw-comp/gricli/.grisu.beta/gricli.debug to eresearch-admin@auckland.ac.nz together with description of what triggered the problem adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0087

adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0154 adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r javax.xml.ws.soap.SOAPFaultException: Unknown error while trying to create job: null at com.sun.xml.ws.fault.SOAP11Fault.getProtocolException(SOAP11Fault.java:189) at com.sun.xml.ws.fault.SOAPFaultBuilder.createException(SOAPFaultBuilder.java:122) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:119) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.createJob(Unknown Source) at grisu.frontend.model.job.JobObject.createJob(JobObject.java:460) at grisu.gricli.command.SubmitCommand.createJob(SubmitCommand.java:39) at grisu.gricli.command.SubmitCommand.execute(SubmitCommand.java:50) at grisu.gricli.Gricli.runCommand(Gricli.java:204) at grisu.gricli.Gricli.run(Gricli.java:195) at grisu.gricli.Gricli.executionLoop(Gricli.java:65) at grisu.gricli.Gricli.main(Gricli.java:179) command failed. Either connection to server failed, or this is gricli bug. Please send /home/vme28/sw-comp/gricli/.grisu.beta/gricli.debug to eresearch-admin@auckland.ac.nz together with description of what triggered the problem adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0156

adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0254 grisu.frontend.control.clientexceptions.FileTransactionException: Could not upload input file. at grisu.model.FileManager.uploadInputFile(FileManager.java:1787) at grisu.model.FileManager.uploadJobInput(FileManager.java:2002) at grisu.frontend.control.fileTransfers.FileTransaction$1.call(FileTransaction.java:211) at grisu.frontend.control.fileTransfers.FileTransaction$1.call(FileTransaction.java:180) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0255

adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0445 javax.xml.ws.soap.SOAPFaultException: Fault occurred while processing. at com.sun.xml.ws.fault.SOAP11Fault.getProtocolException(SOAP11Fault.java:189) at com.sun.xml.ws.fault.SOAPFaultBuilder.createException(SOAPFaultBuilder.java:122) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:119) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.uploadInputFile(Unknown Source) at grisu.model.FileManager.uploadInputFile(FileManager.java:1780) at grisu.model.FileManager.uploadJobInput(FileManager.java:2002) at grisu.frontend.control.fileTransfers.FileTransaction$1.call(FileTransaction.java:211) at grisu.frontend.control.fileTransfers.FileTransaction$1.call(FileTransaction.java:180) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0446

adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0588 adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-0589 adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job property is not validsubmissionLocation: Submissionlocation medium64:ng2sge.canterbury.ac.nz#SGE not available for this kind of job (using VO: /ARCS/BeSTGRID) adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job property is not validsubmissionLocation: Submissionlocation medium64:ng2sge.canterbury.ac.nz#SGE not available for this kind of job (using VO: /ARCS/BeSTGRID) adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job property is not validsubmissionLocation: Submissionlocation medium64:ng2sge.canterbury.ac.nz#SGE not available for this kind of job (using VO: /ARCS/BeSTGRID)

.... and then 700 jobs failed to submit with the above message repeating...

adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-1622 adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-1623 fail to submit job: Submission to endpoint failed: Credential associated with job: DC=nz,DC=org,DC=bestgrid,DC=slcs,O=University of Canterbury,CN=Vladimir Mencl -2vdKb_4CoiSg1P_uGfB9YTRJLo / R-bubb-pi075-1623 is not valid. adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-1624 adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r

and then got several hundered of these: adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-1632 com.sun.xml.ws.server.UnsupportedMediaException: Unsupported Content-Type: text/html; charset=iso-8859-1 Supported ones are: [text/xml] at com.sun.xml.ws.encoding.StreamSOAPCodec.decode(StreamSOAPCodec.java:295) at com.sun.xml.ws.encoding.StreamSOAPCodec.decode(StreamSOAPCodec.java:129) at com.sun.xml.ws.encoding.SOAPBindingCodec.decode(SOAPBindingCodec.java:360) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.process(HttpTransportPipe.java:187) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.processRequest(HttpTransportPipe.java:94) at com.sun.xml.ws.transport.DeferredTransportPipe.processRequest(DeferredTransportPipe.java:89) at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:598) at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:557) at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:542) at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:439) at com.sun.xml.ws.client.Stub.process(Stub.java:222) at com.sun.xml.ws.client.sei.SEIStub.doProcess(SEIStub.java:135) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:109) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.getActionStatus(Unknown Source) at grisu.model.status.StatusObject.waitForActionToFinish(StatusObject.java:120) at grisu.model.status.StatusObject.waitForActionToFinish(StatusObject.java:111) at grisu.model.FileManager.uploadInputFile(FileManager.java:1784) at grisu.model.FileManager.uploadJobInput(FileManager.java:2002) at grisu.frontend.control.fileTransfers.FileTransaction$1.call(FileTransaction.java:211) at grisu.frontend.control.fileTransfers.FileTransaction$1.call(FileTransaction.java:180) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Exception in thread "Thread-3365" com.sun.xml.ws.server.UnsupportedMediaException: Unsupported Content-Type: text/html; charset=iso-8859-1 Supported ones are: [text/xml] at com.sun.xml.ws.encoding.StreamSOAPCodec.decode(StreamSOAPCodec.java:295) at com.sun.xml.ws.encoding.StreamSOAPCodec.decode(StreamSOAPCodec.java:129) at com.sun.xml.ws.encoding.SOAPBindingCodec.decode(SOAPBindingCodec.java:360) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.process(HttpTransportPipe.java:187) at com.sun.xml.ws.transport.http.client.HttpTransportPipe.processRequest(HttpTransportPipe.java:94) at com.sun.xml.ws.transport.DeferredTransportPipe.processRequest(DeferredTransportPipe.java:89) at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:598) at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:557) at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:542) at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:439) at com.sun.xml.ws.client.Stub.process(Stub.java:222) at com.sun.xml.ws.client.sei.SEIStub.doProcess(SEIStub.java:135) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:109) at com.sun.xml.ws.client.sei.SyncMethodHandler.invoke(SyncMethodHandler.java:89) at com.sun.xml.ws.client.sei.SEIStub.invoke(SEIStub.java:118) at $Proxy41.getJobStatus(Unknown Source) at grisu.frontend.model.job.JobObject.getStatus(JobObject.java:829) at grisu.frontend.control.jobMonitoring.RunningJobManager$7.run(RunningJobManager.java:591)

and then again a few

adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job name is R-bubb-pi075-1805 adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job property is not validsubmissionLocation: Submissionlocation medium64:ng2sge.canterbury.ac.nz#SGE not available for this kind of job (using VO: /ARCS/BeSTGRID) adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job property is not validsubmissionLocation: Submissionlocation medium64:ng2sge.canterbury.ac.nz#SGE not available for this kind of job (using VO: /ARCS/BeSTGRID) adding input file /home/vme28/sw-comp/r/jedrzej-jun2011-multi-pi/Evaluation_Markov-ADF-Test-2011-05-09-mc50-pi075.r job property is not validsubmissionLocation: Submissionlocation medium64:ng2sge.canterbury.ac.nz#SGE not available for this kind of job (using VO: /ARCS/BeSTGRID)

surprisingly, Gricli kept marching on till the end of the file - though it submitted only about a third of the jobs.

vladimir-mencl-eresearch commented 13 years ago

Could not find an attach file button here. Uploaded to the DF:

https://df.bestgrid.org/quickshare/e5fb450096e79374/pi075-subm.log

makkus commented 12 years ago

Grisu/gricli should scale better now, especially in regards to job submissions. Run tests with up to 5000 jobs and they submitted (mostly) without any errors.

Will close this now.