riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
98 stars 41 forks source link

Non-rendered input files for HTCondor workflows #119

Closed tvoigtlaender closed 2 years ago

tvoigtlaender commented 2 years ago

As far as I can see, there is currently no way to add files that should not be rendered to the transfer_input_files parameter of an HTCondor job file.

This is especially detrimental for .tar.gz files as the python3 open().read() is unable to read them. https://github.com/riga/law/blob/master/law/job/base.py#L735 (This did not result in a fatal error in python2, but does in python3.)

There are some workarounds to this (like the errors="ignore" argument of open()), but an option to append un-rendered files to the list of rendered ones in the job file would be helpful.

tvoigtlaender commented 2 years ago

Unfortunately it seems like the workaround I mentioned breaks the .tar.gz file.

riga commented 2 years ago

Hi @tvoigtlaender ,

Good point! I just added a few changes in a feature branch https://github.com/riga/law/tree/feature/improve_job_input_control (changes here).

Could you give it a try with

config.input_files["your_tgz_file"] = law.JobInputFile("your.tar.gz", render=False)

?

tvoigtlaender commented 2 years ago

While the syntax provided by you allows the job to be sent, it seems to cause some issues with the rendering. The jobs fail with:

job: 1, branch(es): 0, id: 1453127.0, status: retry, code: 0, error: Error from sspan>lot1@f0<span3-001-175-e.gridka.de: Error running docker job: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: '<$PWD>/data/jobs/tmp1q0m3xan/bash_wrapper_a2ea50c413_0To1.sh': stat <$PWD>/data/jobs/tmp1q0m3xan/bash_wrapper_a2ea50c413_0To1.sh: no such file or directory: unknown"

The file at <$PWD>/data/jobs/tmp1q0m3xan/bash_wrapper_a2ea50c413_0To1.sh exists, but is not reachable from inside the job. I noticed that the rendered filenames are stated with their full path instead of their basenames (<$PWD>/data/jobs/tmp1q0m3xan/law_job_f2ff5d1e31_0To1.sh instead of just law_job_f2ff5d1e31_0To1.sh). This is probably the issue, as the files transmitted via HTCondor are put in the starting DIR of the job.

riga commented 2 years ago

Could you maybe send me a bundle with all files belonging to a single job? I think this would help me debugging this :) (CERN Mattermost for instance). Thank you!

riga commented 2 years ago

Fixed in #120 🎉