riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
98 stars 41 forks source link

Use an absolute executable path in slurm workflows #118

Closed lmoureaux closed 2 years ago

lmoureaux commented 2 years ago

Absolute paths will always work if the filesystem layout is shared between the submission machine and the worker node. The previous implementation was forcing everything to be a relative path and, in my testing, was only working in a specific setting when submitting from the data folder.

riga commented 2 years ago

Hi @lmoureaux , Thanks for opening this!

I have to run a few tests to understand this again, but in the meantime, could you check if setting absolute_paths to True in the job submission config? This is interpreted (e.g.) at

https://github.com/riga/law/blob/ad8507f2921311878d6e7e929c9e3ae8a00ff684/law/contrib/slurm/job.py#L357-L367

and already converts all paths to absolute representations. This could be done e.g. here via c.absolute_paths = True.

riga commented 2 years ago

@lmoureaux I found some time to refactor the job file creation, also for slurm. Could you test the feature branch https://github.com/riga/law/tree/feature/improve_job_input_control (changes here) and see whether your fix is needed there?

lmoureaux commented 2 years ago

Hi @riga, thanks for taking care of this! I'll give it a try over the weekend when the control room leaves me alone for a second.

riga commented 2 years ago

@lmoureaux There has been a PR (#120) in the meantime, which might have solved the issue. In case you find time to test, the master branch (pushed already to the "public/law_sw" checkouts at CERN and NAF) should hopefully already solve it :)

lmoureaux commented 2 years ago

@lmoureaux There has been a PR (#120) in the meantime, which might have solved the issue. In case you find time to test, the master branch (pushed already to the "public/law_sw" checkouts at CERN and NAF) should hopefully already solve it :)

It's working, thanks a lot!