Closed arnikz closed 5 years ago
For Slurm '%j' in the filename is replaced with the job identifier. See https://slurm.schedmd.com/sbatch.html#lbAF
For example --stdout test-%j.log
For Grid Engine you can use pseudo environment variables, as described in qsub man snippet:
If the pathname contains certain pseudo
environment variables, their value will be expanded at
runtime of the job and will be used to constitute the
standard error stream path name. The following pseudo
environment variables are supported currently:
$HOME home directory on execution machine
$USER user ID of job owner
$JOB_ID current job ID
$JOB_NAME current job name (see -N option)
$HOSTNAME name of the execution host
$TASK_ID array job task index number
For example --stdout test-\$JOB_ID.log
Xenon could be extended to have a job identifier replacement like %j
in filename for the JobDescription.setStdout/setStderr methods. Then an adaptor has to inject the job identifier in the actual filename or reformat and pass it along to the underlying scheduler.
@jmaassen what do you think?
I've tested xenon-cli (v2.2.1) on SGE with the options --stdout out-\$JOB_ID.log --stderr err-\$JOB_ID.log
but got out-.log
and err-.log
files. However, when submitting a dummy job
echo "echo \$JOB_ID"|qsub
one gets both files including jobid in their names (STDIN.e[jobid]
, STDIN.o[jobid]
). So, xenon does not substitute the remote JOB_ID
but uses the local one (which is of course not set on the submission node).
It's sge job script that need $JOB_ID, which then gets replaced by the actual id. Xenon should just pass $JOB_ID string as is to the job script it submits to sge.
You might need some extra escaping.
I expect the job script submitted to sge has #$ -e err-.log
string in it, because snake make does some interpolation.
When I run
xenon -vvv scheduler gridengine --location ssh://localhost:10023 \
--user xenon --password javagat submit \
--stderr test-\$JOB_ID.log hostname
I get a log file called test-1.log
and a log message
17:40:21.538 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.gridengine.GridEngineUtils - Created job script:%n#!/bin/sh
#$ -S /bin/sh
#$ -N xenon
#$ -l h_rt=00:15:00
#$ -o /dev/null
#$ -e 'test-$JOB_ID.log'
Can you run xenon-cli in -vvv mode?
I'm using xenon-cli within snakemake as follows:
snakemake ... --cluster "xenon -vvv scheduler gridengine --location local:// submit ... --stderr stderr-\$JOB_ID.log --stdout stdout-\$JOB_ID.log" &>smk.log&
My smk.log
seems truncated and does not show the lines with the batch script:
Submitted job 8 with external jobid '18:07:09.721 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.ScriptingScheduler - creating sub scheduler for gridengine adaptor at local://'.
B.t.w. Have you tried xenon-cli with --location local://
instead after login to the node? (also add --stdout ...
)
I can't reproduce your run using merely xenon-cli .
$ xenon --version
Xenon CLI v2.4.0, Xenon Library v2.6.0
$ xenon -vvv scheduler gridengine --location local:// submit --stderr err-\$JOB_ID.log --stdout out-\$JOB_ID.log --max-run-time 1 --working-directory . echo.sh &> xenon.log &
$ ls
echo.sh err-.log out-.log xenon.log
$ cat xenon.log
18:42:09.576 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.ScriptingScheduler - creating sub scheduler for gridengine adaptor at local://
18:42:09.582 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - Creating JobQueueScheduler for Adaptor local with multiQThreads: 4 and pollingDelay: 1000
18:42:09.640 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job
18:42:09.641 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Created Job local-0
18:42:09.641 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job to queue unlimited
18:42:09.641 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Waiting for interactive job to start.
18:42:09.708 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: getJobStatus for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-0
18:42:09.709 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.RemoteCommandRunner - CommandRunner took 69 ms, executable = qconf, arguments = [-sql], exitcode = 0, stdout:
all.q
test.q
stderr:
18:42:09.711 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job
18:42:09.711 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Created Job local-1
18:42:09.712 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job to queue unlimited
18:42:09.712 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Waiting for interactive job to start.
18:42:09.758 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: getJobStatus for job local-1
18:42:09.758 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-1
18:42:09.758 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-1
18:42:09.758 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-1
18:42:09.758 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-1
18:42:09.758 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-1
18:42:09.759 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-1
18:42:09.759 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-1
18:42:09.759 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.RemoteCommandRunner - CommandRunner took 48 ms, executable = qconf, arguments = [-sq, all.q,test.q], exitcode = 0, stdout:
qname all.q
hostlist @allhosts
seq_no 0,[@n_01_10=1],[@n_11_20=2],[@n_21_30=3],[@n_31_40=4], \
[@n_41_50=5],[@n_51_57=6],[@n_58_61=7],[@n_62_70=8], \
[@n_71_80=9],[@n_81_90=10],[n0065.compute.hpc=11]
load_thresholds np_load_avg=1.05
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list mpi threaded
rerun FALSE
slots 12,[@cores_36=36],[@cores_12=12],[@cores_48=48], \
[@cores_24=24],[@cores_40=40]
tmpdir /tmp
shell /bin/bash
prolog root@/opt/sge/umc_scripts/prolog.sh
epilog root@/opt/sge/umc_scripts/epilog.sh
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state disabled
s_rt INFINITY
h_rt 2400:0:0
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
qname test.q
hostlist n0059.compute.hpc
seq_no 100
load_thresholds np_load_avg=1.05
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype INTERACTIVE
ckpt_list NONE
pe_list make smp mpi threaded
rerun FALSE
slots 1
tmpdir /tmp
shell /bin/bash
prolog root@/opt/sge/umc_scripts/prologtest.sh
epilog root@/opt/sge/umc_scripts/epilog.sh
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists testqusers
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state disabled
s_rt INFINITY
h_rt 2400:0:0
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
stderr:
18:42:09.763 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job
18:42:09.764 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Created Job local-2
18:42:09.764 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job to queue unlimited
18:42:09.764 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Waiting for interactive job to start.
18:42:09.819 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: getJobStatus for job local-2
18:42:09.819 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-2
18:42:09.819 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-2
18:42:09.819 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-2
18:42:09.819 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-2
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-2
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-2
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-2
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.RemoteCommandRunner - CommandRunner took 57 ms, executable = qconf, arguments = [-spl], exitcode = 0, stdout:
make
mpi
smp
threaded
stderr:
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Created Job local-3
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job to queue unlimited
18:42:09.820 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Waiting for interactive job to start.
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: getJobStatus for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-3
18:42:09.867 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.RemoteCommandRunner - CommandRunner took 47 ms, executable = qconf, arguments = [-sp, make, -sp, mpi, -sp, smp, -sp, threaded], exitcode = 0, stdout:
pe_name make
slots 1
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
qsort_args NONE
pe_name mpi
slots 250
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
qsort_args NONE
pe_name smp
slots 1
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $pe_slots
control_slaves TRUE
job_is_first_task TRUE
urgency_slots min
accounting_summary TRUE
qsort_args NONE
pe_name threaded
slots 6000
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $pe_slots
control_slaves FALSE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
qsort_args NONE
stderr:
18:42:09.870 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.gridengine.GridEngineSetup - Created setup info, queues = {test.q=QueueInfo [name=test.q, slots=1, parallelEnvironments=[make, smp, mpi, threaded]], all.q=QueueInfo [name=all.q, slots=12, parallelEnvironments=[mpi, threaded]]}, parallel environments = {mpi=ParallelEnvironmentInfo [name=mpi, slots=250, allocationRule=ROUND_ROBIN, ppn=0], smp=ParallelEnvironmentInfo [name=smp, slots=1, allocationRule=PE_SLOTS, ppn=0], threaded=ParallelEnvironmentInfo [name=threaded, slots=6000, allocationRule=PE_SLOTS, ppn=0], make=ParallelEnvironmentInfo [name=make, slots=1, allocationRule=ROUND_ROBIN, ppn=0]}
18:42:09.875 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.gridengine.GridEngineUtils - Created job script:%n#!/bin/sh
#$ -S /bin/sh
#$ -N xenon
#$ -wd '/home/cog/akuzniar/sv-callers/snakemake/sv-calling/test/.'
#$ -l h_rt=00:01:00
#$ -o 'out-.log'
#$ -e 'err-.log'
echo.sh
18:42:09.875 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job
18:42:09.875 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Created Job local-4
18:42:09.875 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Submitting job to queue unlimited
18:42:09.875 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: Waiting for interactive job to start.
18:42:09.960 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: getJobStatus for job local-4
18:42:09.960 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-4
18:42:09.960 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-4
18:42:09.960 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-4
18:42:09.961 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: findJob for job local-4
18:42:09.961 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-4
18:42:09.961 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-4
18:42:09.961 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.JobQueueScheduler - local: cleanupJob for job local-4
18:42:09.961 [main] DEBUG nl.esciencecenter.xenon.adaptors.schedulers.RemoteCommandRunner - CommandRunner took 86 ms, executable = qsub, arguments = [], exitcode = 0, stdout:
Your job 3643979 ("xenon") has been submitted
stderr:
3643979
Hmm, seems the shadow (./gradlew installShadowDist
) dist and normal dist (./gradlew installDist
) behave differently. I was using the normal/non-shadow version.
Seems the application shell scripts are differently
diff build/install/xenon/bin/xenon build/install/xenon-cli-shadow/bin/xenon
```diff
158,162c158,161
< save () {
< for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
< echo " "
< }
< APP_ARGS=$(save "$@")
---
> for s in "${@}" ; do
> s=\"$s\"
> APP_ARGS=$APP_ARGS" "$s
> done
164,165c163,164
< # Collect all arguments for the java command, following the shell quoting and substitution rules
< eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $XENON_OPTS -classpath "\"$CLASSPATH\"" nl.esciencecenter.xenon.cli.Main "$APP_ARGS"
---
> # Collect JVM options
> JVM_OPTS=$DEFAULT_JVM_OPTS" "$JAVA_OPTS" "$XENON_OPTS
172c171
< exec "$JAVACMD" "$@"
---
> eval \"$JAVACMD\" $JVM_OPTS -jar \"$CLASSPATH\" $APP_ARGS
\ No newline at end of file
When using the shadow dist a double escape is required:
./gradlew installShadowDist
./build/install/xenon-cli-shadow/bin/xenon scheduler gridengine \
--location ssh://localhost:10022 --user xenon --password javagat submit \
--stderr err-\\\$JOB_ID.log --stdout out-\\\$JOB_ID.log hostname
Yeah, this double escape worked out on its own but not within the double quotes of the snakemake cmd. I've tried it with single quotes instead and that worked! However, we should make this less cryptic for an end user, I think.
How to write stderr/stdout log files (e.g. in the form
<jobname>.{e,o}<jobID>
) for each job?xenon scheduler gridengine --location ... submit --working-dir .
xenon scheduler gridengine --location ... submit --working-dir . --stderr err.log --stdout out.log
Note: I'm using the latest Xenon v2.3.0