Closed smilesun closed 7 years ago
please post getStatus output
> getStatus()
Status for 10 jobs:
Submitted : 10 (100.0%)
Queued : 0 ( 0.0%)
Started : 10 (100.0%)
Running : 0 ( 0.0%)
Done : 6 ( 60.0%)
Error : 4 ( 40.0%)
Expired : 0 ( 0.0%)
and the error messages please, of the jobs
pplying algorithm 'default_fda' on problem 'march.2017.task4' ...
Error in requirePackages(package, why = stri_paste("learner", id, sep = " "), :
For learner fdaclassif.np please install the following packages: fda.usc
### [bt 2017-05-03 10:40:28]: Job terminated with an exception [batchtools job.id=7]
### [bt 2017-05-03 10:40:28]: Calculation finished!
actually I must terminate the interactive R session to see those errors getJobTable()
7: 1 secs 2 secs march.2017.task4 default_fda fdaclassif.np
8: 0 secs 2 secs march.2017.task4 default_fda fdaclassif.glm
9: 0 secs 2 secs march.2017.task4 default_fda fdaclassif.knn
10: 0 secs 2 secs march.2017.task4 default_fda fdaclassif.kernel
those jobs did not run, but they just paused their
well. why are you then claiming
some simple jobs did not stop at all
all of your jobs have stopped. some with state "done", some with state "error". and you can clearly see the error. you are missing a dependency package for some jobs.
where exactly is the problem? and hint: you should have run more calls of "testJob" locally, to see this package error sooner. it is a very common mistake.
The point is I have to terminate the R interactive session to see this error log. I am using the snicker for batchtools. Even if I use testJob(), it did not stop, so the user never know what happened there. If you kill them too early, maybe you ceased the process. I will try to reproduce the problem when I have time
further info:
the default.sleep function should have nothing to do with what you are asking about. bt resubmits jobs in case of temporary cluster errors. but this is something very rare and also different to what you are experiencing here: in your case it is a normal R exception. in this case bt does not resubmit anything.
also, temporary cluster errors need to be defined in your cluster functions. this is something like "cluster is currently busy, please resubmit job again later". on a well operating cluster this should nearly never happen.
but all of this is just info on the side. what happens in your case is something else and much simpler.
The point is I have to terminate the R interactive session to see this error log. I am using the snicker for batchtools. Even if I use testJob(), it did not stop, so the user never know what happened there. If you kill them too early, maybe you ceased the process. I will try to reproduce the problem when I have tim
ok, but this is something else and you did not write about this in your first post. please show this example to janek and me tomorrow. it will be something related just to the snickers server and your config (i guess)
and:
Even if I use testJob(), it did not stop
so what does happen? does testjob "block" the process? is there a difference between "external = TRUE / FALSE" for testJob?
(neither should block)
we resolved this
Hi, I ran a project using batchtools on a single cpu and for 1 day some simple jobs did not stop at all and by observing the getJobTable(), I find out that those jobs are just run for 2 seconds and then they generate an error. After looking at the documentation , I found the following
My question is, is this default.sleep function creating this behavior? COuld something explain the advantage of doing so?