kleinhenz / SlurmClusterManager.jl

julia package for running code on slurm clusters
46 stars 5 forks source link

Bad resolution of relative path in include statement #2

Closed matthiasbe closed 3 years ago

matthiasbe commented 3 years ago

Hi,

Very interesting code, thank you for sharing. I share this vision that the login node should not be using during the job.

I have an issue with inclusion of other julia files.

My main file main.jl start like this

#!/path/to/julia --project

using SlurmClusterManager

using Plots
using Distributed
using DelimitedFiles

addprocs(SlurmManager())

include("utility_functions.jl")
# Other code ...

I run this on the cluster governed by Slurm like this: sbatch -n 4 parallel.jl

And I get the following error in the output file slurm-7451.out

==========================================
SLURM_JOB_ID = 7451
SLURM_JOB_NODELIST = node023
==========================================
ERROR: LoadError: could not open file /var/spool/slurm/d/job07451/utility_functions.jl
Stacktrace:
 [1] include(::String) at ./client.jl:457
 [2] top-level scope at /var/spool/slurm/d/job07451/slurm_script:15
 [3] include(::Function, ::Module, ::String) at ./Base.jl:380
 [4] include(::Module, ::String) at ./Base.jl:368
 [5] exec_options(::Base.JLOptions) at ./client.jl:296
 [6] _start() at ./client.jl:506
in expression starting at /var/spool/slurm/d/job07451/slurm_script:15

Is it a slurm parameter I didn't set correctly maybe ?

matthiasbe commented 3 years ago

Using absolute paths solves this

kleinhenz commented 3 years ago

This happens because include resolves relative to the directory of the script file and slurm copies the submitted script to a temporary directory before running it. You can see this by putting println(@__FILE__) in your script. I think this is intended behavior by both julia and slurm so probably not something that we can fix in this package although I agree it is a bit confusing.

You can do include(abspath("utility_functions.jl")) which works because abspath resolves relative to the current working directory not the directory of the source script. The most idiomatic solution is probably to put all of the shared functionality into a package and then import that package in your script. include should work fine in the package since it is only the driver script which is copied by slurm.

matthiasbe commented 3 years ago

Ok thank you for the precisions.