riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
100 stars 41 forks source link

HTCondor Jobs are broken in 0.1.19 #192

Open cverstege opened 4 hours ago

cverstege commented 4 hours ago

Submitting jobs via HTCondor is failing, since some jinja templates are not replaced correctly anymore. See the following file list of one of the failing jobs:

-- files after cleanup
> ls -a /srv/condor/local/execute/scratch/dir_735898 ($LAW_JOB_INIT_DIR)
total 96
drwx------ 4 nobody nobody  4096 Oct 30 18:06 .
drwxr-xr-x 3 root   root      24 Oct 30 18:06 ..
srwxr-xr-x 1 nobody nobody     0 Oct 30 18:06 .docker_sock
-rw-r--r-- 1    982    981  5282 Oct 30 18:06 .job.ad
-rw-r--r-- 1    982    981  7127 Oct 30 18:06 .machine.ad
-rw-r--r-- 1 nobody nobody  7078 Oct 30 18:06 .update.ad
-rw-r--r-- 1 nobody nobody  1721 Oct 30 18:06 bootstrap_80c48ed723_25To26.sh
-rw-r--r-- 1 nobody nobody     0 Oct 30 18:06 docker_stderror
-rwxr-xr-x 1 nobody nobody 20064 Oct 30 18:06 law_job_4ec8600b8a.sh
-rw-r--r-- 1 nobody nobody 10667 Oct 30 18:06 law_wlcg_tools_5b0ed65717.sh
drwxrwxrwt 2 nobody nobody  4096 Oct 30 18:06 tmp
drwxrwxrwt 3 nobody nobody  4096 Oct 30 18:06 var
-rw------- 1 nobody nobody 11938 Oct 30 18:06 x509_proxy
lrwxrwxrwx 1 nobody nobody    60 Oct 30 18:06 {{input_files}} -> /srv/condor/local/execute/scratch/dir_735898/{{input_files}}
-rw-r--r-- 1 nobody nobody  5534 Oct 30 18:06 {{log_file}}

The jinja templates {{input_files}} should be replaced with the actual filename. It was still working fine with commit 4389a86e892c9cb6e2e01a2266dd715b8f0d3392. I did not test any other commits between here and the 0.1.19 release.

I can try and run a git bisect in the coming days.

riga commented 3 hours ago

Hi @cverstege , thanks for reporting this.

Could you also try with the latest master?

And it looks like you are using grouped submission (the default), is that correct?

cverstege commented 3 hours ago

Current master was also failing with the same issue. I would have to double check if it was a clean new submission though. I didn't change many settings from the default, so I think it's using grouped submission. I will have a closer look tomorrow with more details.

riga commented 2 hours ago

Thank you!