Closed LourensVeen closed 6 years ago
If I remember correctly the stdin/out/err filenames could not be set in torque. Only the defaults can be used.
Either way, Xenon should hide this from the user. Will look into this.
Yes, that's what it says in the other issue. I'm proposing to work around being unable to redirect the standard in/out/err of the job by redirecting the standard in/out/err of the command running inside the job.
I have now worked around this in Cerise (which still uses Xenon 1), by effectively calling bash -c 'cd workdir ; command arguments... >stdout.txt 2>stderr.txt'
instead of using Xenon's setWorkingDirectory()
, setStdout()
and setStderr()
. That seems to fix it at least on the BinAC cluster. Note that we needed the explicit cd
command. Somehow using setWorkingDirectory()
on the bash
command doesn't propagate to the command run inside of the subshell, not sure why.
We should apply this trick in the script generator for Torque.
Would it be OK to fix in the 2.x series, or should we patch a 1.x for you ?
A fix for Xenon 2 would be nice in general I think, but with a working work-around there's no hurry from my side. I'm still planning to upgrade to Xenon 2 anyway; let's not waste time on maintaining 1.x.
The setWorkingDirectory
is correctly passing the integration tests for Torque on 2.x
For 2.x calling setStdout
on a JobDescription for Torque produces an InvalidJobDescriptionException
on submission, which explains that this is not supported by torque.
Although correct, this requires the user to solve the problem which is not what we want...
It seems that setWorkingDirectory
affects the directory in which qsub
is called, but on this cluster at least, the job script is apparently started in the home directory regardless of where qsub
ran. So that's why we needed the additional cd workdir
I think.
If I do the cd workdir
but don't call setWorkingDirectory
, my command runs in workdir
, but the Torque-generated log files <jobname>.o<jobid>
and <jobname>.e<jobid>
end up in the home directory.
This is a bit of an issue I think. From the user's perspective, I/O has been redirected, and those Torque log files should not exist. We don't really want to force the user to clean them up, but we don't want them littering the remote account either. Hmm.
The default behavior for Torque is to use the home directory as workdir and redirect stdout and stderr to the <jobname>.o<jobid>
files.
When the setWorkingDirectory
is called we generate a #PBS -d <workdir>
in the jobscript which should instruct Torque to change the working dir. For 2.x this seems to work correctly in our integration tests. This could be a 1.x bug, I'll see if I can find this.
I'm not sure if we can tell torque not to generate the <jobname>.o<jobid>
files. I remember that we could not convince Torque to change their name, that's why we introduced the error. I'll dig a bit deeper...
It's possible that there's something funny with the default bash configuration on this particular cluster as well. Since I'm starting bash, if there's a cd $HOME
in the .profile
or something, that could also mess things up.
I've added support for setStdout
, setStderr
and setStdin
to the Torque adaptor.
The first two translate to:
#PBS -o </workdir/>out.txt
#PBS -e </workdir/>err.txt
where the </workdir/>
is explicitly added if set using setWorkingDirectory
.
Torque has no direct support for settting stdin (as far as I can tell). So this is implemented by explicitly redirecting to the executable in the job script.
An example, that assumes that /home/xenon
is $HOME, test_workdir
is the workdir and test_in.txt
in the workdir contains the input:
description.setWorkingDirectory("test_workdir");
description.setExecutable("/bin/cat");
description.setStdin("test_in.txt");
description.setStdout("test_out.txt");
This will produce the following job script:
#!/bin/sh
#PBS -S /bin/sh
#PBS -N xenon
#PBS -d /home/xenon/test_workdir
#PBS -o /home/xenon/test_workdir/test_out.txt
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:15:00
/bin/cat< test_in.txt
It seems to work fine in our Torque test setup.
Maybe you could test the above job script on your cluster to see what happens? Just replace /home/xenon/test_workdir
with an existing workdir and ensure the test_in.txt
exists there.
It works and produces two files: test_out.txt
and xenon.e4979691
inside the test_workdir
folder.
OK, so the approach used in 2.x seems to give the expect result.
Marking as closed for now. Reopen if you don't agree
I'm trying to run a CWL runner via Xenon, which writes a description of the results produced on stdout. This works fine with SLURM, but when using Torque, Xenon complains that it cannot set the standard inputs/outputs. See also https://github.com/NLeSC/Xenon/issues/239
It seems simple enough to fix by just adding some redirection operators (i.e. "< stdin.txt >stdout.txt 2>stderr.txt") to the command in the job script that is being generated, in TorqueUtils.generateScriptContent(). The note in the issue above does not say if this was considered, just that it was decided to not support this at all, but well, I need this functionality.
It looks like it would probably work to add those redirection commands to the argument directly, but if Xenon is going to offer an API for this, then it should work.