Closed dagola closed 7 years ago
Yes. I'll close #126
I've extended cfSlurm with the nodename argument and updated the gitignore with the RStudio files.
If I can avoid it, it would rather not touch waitForJobs()
for now. Can you please test if setting a custom sleep function would suffice to make batchtools work with SSHFS? I.e., set in your config file
sleep = 30
?
Unfortuneately, that does not help. waitForJobs
exits as soon as a job finishes during the repeat loop but the update file is not yet available. Thus, there is a mismatch between jobs on the system as reported by .findOnSystem
and .findNotTerminated
because there are terminated jobs that haven't been updated.
Either waitForJobs
should allow for more repeat loops until a job is considered as expired as you suggested, or there should be updates of the job table more often to increase robustness or it shouldn't exit because of possibly expired jobs but just issue a warning and do a final check when all jobs terminated to return TRUE
or FALSE
.
I managed to start jobs on our Slurm cluster from my laptop. Will finalize tomorrow. Can you test again on your cluster before I push to CRAN?
Perfect, works like a charm! I used the defaults, i.e. 3 rounds until a job is considered as expired.
Ok great. I'll run some more tests, merge into master and upload to CRAN. Thanks for your work, and sorry for "ignoring" your PR and just re-writing stuff.
Does this PR supersede #126 ?