mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
170 stars 51 forks source link

Jobs expire on LiDO3 directly after submission #216

Closed s-abbas closed 5 years ago

s-abbas commented 5 years ago

Hello,

I'm trying to set up batchtools for use with LiDO3. I use the template file from the repository: https://github.com/mllg/batchtools/blob/master/inst/templates/slurm-lido3.tmpl

My .batchtools.conf.R contains cluster.functions = makeClusterFunctionsSlurm("slurm-lido3.tmpl") default.resources = list(R = "R/3.5.1-gcc73-base", modules = c("openblas/0.2.20-with-openmp", "gcc/7.3.0"))

In default.resources, I use the same modules that I add in my .bashrc and .bash_profile files. The module openblas/0.2.20-with-openmp is automatically added on startup. I don't know if this is a problem.

When I try the examples from the vignette, my jobs all expire when I submit them using the submitJobs() command. However, using testJobs() works fine. I used different resource values for the walltime and memory in submitJobs() but none of it worked. I'm sure, I miss something very basic, but I'm a very inexperienced HPC user.

I would be happy if I could get a hint on what's wrong.

Thanks in advance!

Sermad

mllg commented 5 years ago

@surmann Have you computed something on lido3 recently? Is the config right?

mllg commented 5 years ago

This is in my config:

temp.dir = "/work/[login]/tmp"
cluster.functions = makeClusterFunctionsSlurm("slurm-lido3")
default.resources = list(walltime = 30 * 60, memory = 512)

My best guess: You have created your registry in your home directory, which is not accessible from the nodes. You have to work in "/work/[login]/".

s-abbas commented 5 years ago

That was indeed the case. It works now, thanks! :)