mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
171 stars 51 forks source link

Update slurm template #127

Closed dagola closed 7 years ago

mllg commented 7 years ago

Does this PR supersede #126 ?

dagola commented 7 years ago

Yes. I'll close #126

mllg commented 7 years ago

I've extended cfSlurm with the nodename argument and updated the gitignore with the RStudio files. If I can avoid it, it would rather not touch waitForJobs() for now. Can you please test if setting a custom sleep function would suffice to make batchtools work with SSHFS? I.e., set in your config file

sleep = 30

?

dagola commented 7 years ago

Unfortuneately, that does not help. waitForJobs exits as soon as a job finishes during the repeat loop but the update file is not yet available. Thus, there is a mismatch between jobs on the system as reported by .findOnSystem and .findNotTerminated because there are terminated jobs that haven't been updated.

Either waitForJobs should allow for more repeat loops until a job is considered as expired as you suggested, or there should be updates of the job table more often to increase robustness or it shouldn't exit because of possibly expired jobs but just issue a warning and do a final check when all jobs terminated to return TRUE or FALSE.

mllg commented 7 years ago

I managed to start jobs on our Slurm cluster from my laptop. Will finalize tomorrow. Can you test again on your cluster before I push to CRAN?

dagola commented 7 years ago

Perfect, works like a charm! I used the defaults, i.e. 3 rounds until a job is considered as expired.

mllg commented 7 years ago

Ok great. I'll run some more tests, merge into master and upload to CRAN. Thanks for your work, and sorry for "ignoring" your PR and just re-writing stuff.