openmole / gridscale

Scala library for accessing various file, batch systems, job schedulers and grid middlewares.
GNU Affero General Public License v3.0
27 stars 8 forks source link

Missing state C in PBS #13

Closed jopasserat closed 7 years ago

jopasserat commented 7 years ago

As reported in OpenMOLE Quoting from @gdower below:

The jobs get submitted to the PBS queue and show PBS state C for complete, but in the environment pane I get the debug message "Unrecognized state C," and no results return:

java.lang.RuntimeException: Unrecognized state C
    at fr.iscpif.gridscale.pbs.PBSJobService$class.translateStatus(PBSJobService.scala:101)
    at org.openmole.plugin.environment.pbs.PBSJobService$$anon$1.translateStatus(PBSJobService.scala:39)
    at fr.iscpif.gridscale.pbs.PBSJobService$$anonfun$state$1.apply(PBSJobService.scala:78)
    at fr.iscpif.gridscale.pbs.PBSJobService$$anonfun$state$1.apply(PBSJobService.scala:64)
    at fr.iscpif.gridscale.ssh.SSHConnectionCache$class.withConnection(SSHConnectionCache.scala:27)
    at org.openmole.plugin.environment.pbs.PBSJobService$$anon$1.withConnection(PBSJobService.scala:39)
    at fr.iscpif.gridscale.pbs.PBSJobService$class.state(PBSJobService.scala:64)
    at org.openmole.plugin.environment.pbs.PBSJobService$$anon$1.state(PBSJobService.scala:39)
    at org.openmole.plugin.environment.pbs.PBSJobService$$anon$1.state(PBSJobService.scala:39)
    at org.openmole.plugin.environment.gridscale.GridScaleJobService$class._state(GridScaleJobService.scala:30)
    at org.openmole.plugin.environment.pbs.PBSEnvironment$$anon$1._state(PBSEnvironment.scala:86)
    at org.openmole.plugin.environment.batch.jobservice.JobService$$anonfun$state$1.apply(JobService.scala:41)
    at org.openmole.plugin.environment.batch.jobservice.JobService$$anonfun$state$1.apply(JobService.scala:41)
    at org.openmole.plugin.environment.batch.control.LimitedAccess$LimitedAccessToken.access(LimitedAccess.scala:37)
    at org.openmole.plugin.environment.batch.jobservice.JobService$class.state(JobService.scala:41)
    at org.openmole.plugin.environment.pbs.PBSEnvironment$$anon$1.state(PBSEnvironment.scala:86)
    at org.openmole.plugin.environment.batch.jobservice.BatchJob$class.updateState(BatchJob.scala:57)
    at org.openmole.plugin.environment.pbs.PBSJobService$$anon$2.org$openmole$plugin$environment$batch$jobservice$BatchJobId$$super$updateState(PBSJobService.scala:63)
    at org.openmole.plugin.environment.batch.jobservice.BatchJobId$class.updateState(BatchJobId.scala:27)
    at org.openmole.plugin.environment.pbs.PBSJobService$$anon$2.updateState(PBSJobService.scala:63)
    at org.openmole.plugin.environment.batch.refresh.RefreshActor$$anonfun$receive$1.apply(RefreshActor.scala:35)
    at org.openmole.plugin.environment.batch.refresh.RefreshActor$$anonfun$receive$1.apply(RefreshActor.scala:32)
    at org.openmole.plugin.environment.batch.control.UsageControl$class.tryWithToken(UsageControl.scala:28)
    at org.openmole.plugin.environment.batch.control.LimitedAccess.tryWithToken(LimitedAccess.scala:26)
    at org.openmole.plugin.environment.batch.environment.BatchService$class.tryWithToken(BatchService.scala:28)
    at org.openmole.plugin.environment.pbs.PBSEnvironment$$anon$1.tryWithToken(PBSEnvironment.scala:86)
    at org.openmole.plugin.environment.batch.refresh.RefreshActor$.receive(RefreshActor.scala:32)
    at org.openmole.plugin.environment.batch.refresh.JobManager$DispatcherActor$.receive(JobManager.scala:49)
    at org.openmole.plugin.environment.batch.refresh.JobManager$$anonfun$dispatch$1.apply$mcV$sp(JobManager.scala:57)
    at org.openmole.core.threadprovider.ThreadProvider$RunClosure.run(ThreadProvider.scala:21)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

The submitted jobs dwindle down towards 0 and then get resubmitted by OpenMOLE.

Here's a qstat from PBS:

Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
940.cam1                   ...ff1480160.pbs pi              00:00:00 C batch          
941.cam1                   ...c839be65d.pbs pi              00:00:00 C batch          
942.cam1                   ...79cd988b0.pbs pi              00:00:00 C batch          
943.cam1                   ...df3ce1fb5.pbs pi              00:00:00 C batch          
944.cam1                   ...d964870d9.pbs pi              00:00:00 C batch          
945.cam1                   ...4024c354f.pbs pi              00:00:00 C batch          
946.cam1                   ...f73b7faae.pbs pi              00:00:00 C batch          
947.cam1                   ...ee767a09f.pbs pi              00:00:00 C batch          
948.cam1                   ...04159a6ea.pbs pi              00:00:00 C batch          
949.cam1                   ...f929e6c7d.pbs pi              00:00:00 C batch          
950.cam1                   ...6ad74d7ea.pbs pi              00:00:00 C batch          
951.cam1                   ...6ccd91227.pbs pi              00:00:00 C batch          
952.cam1                   ...14fdf6121.pbs pi              00:00:00 C batch          
953.cam1                   ...2a4c4126a.pbs pi              00:00:00 C batch          
954.cam1                   ...0f3f8d8f6.pbs pi              00:00:00 C batch          
955.cam1                   ...78362160f.pbs pi              00:00:00 C batch          
956.cam1                   ...14fd22463.pbs pi              00:00:00 C batch          
957.cam1                   ...961d37d22.pbs pi              00:00:00 C batch          
958.cam1                   ...d70da99ba.pbs pi              00:00:00 C batch
jopasserat commented 7 years ago

State C is missing in master and in the new version of GridScale in the dsl branch.

@gdower: do you know what it's representing in your PBS environment (I assume it's a transitional state when the job is completing?) Would you care proposing a PR for this?

jopasserat commented 7 years ago

fixed in 905b83d7347ff61ccd44eff0d1656db6b4015d64 and 97068ce85ce75465ec8f9b936ec345beb530a357