Closed Tintest closed 5 years ago
This is definitely possible. The first to do is to identify the command lines required to:
I would to describe below this thread an example command line for each of the above point, including the exact output they produce. Then I will be able to advice you how to continue.
Thanks for your answer,
a usefull implementation would be the implementation of the oarsub --array option to parallelize multiple sample job submissions and computing filenames (
--array-param-file
)
Please format commands using markdown formatting to improve the readability of the text.
Then, please provide a concrete examples for those commands and above all the exact output they produce. That's important because the executor needs to parse the output text to extract the relevant informants.
Finally, let me clarify I'm very happy to accept a pull request for this feature, and I'm ready to advice you how to implement it. However I won't be able to implement it because I cannot test it.
Sorry, first post on github for me. I try the following, please tell me if it is correct for you.
First we write the command to be submitted to oarsub or wrap it up in a shell script:
nano test.sh
echo blabla
sleep 10
echo blibli
then submit the job to oar with parameters:
oarsub --resource /nodes=1/core=1,walltime=00:01:10 --directory `pwd` --name bloblo --project epimed "bash test.sh"
oar echoes in the shell:
[ADMISSION RULE] Modify resource description with type constraints
[PROJECT] Adding project constraints: (team='epimed' or team='ciment' or team='visu')
OAR_JOB_ID=7157387
jobs generate 2 files: stderr and stdout
-rw-r--r-- 1 ju4667th l-iab 0 May 21 18:31 OAR.bloblo.7157387.stderr
-rw-r--r-- 1 ju4667th l-iab 14 May 21 18:31 OAR.bloblo.7157387.stdout
more OAR.bloblo.7157387.stdout
blabla
blibli
The oarstat command
oarstat -j 7157387
Job id Name User Submission Date S Queue
---------- -------------- -------------- ------------------- - ----------
7157387 bloblo ju4667th 2018-05-21 18:31:24 T default
The T is for Terminated.
oarstat -fj 7157387
Job_Id: 7157387
job_array_id = 7157387
job_array_index = 1
name = bloblo
project = epimed
owner = ju4667th
state = Terminated
wanted_resources = -l "{type = 'default'}/network_address=1/core=1,walltime=0:1:10"
types =
dependencies =
assigned_resources = 805
assigned_hostnames = luke41
queue = default
command = bash test.sh
exit_code = 0 (0,0,0)
launchingDirectory = /home/ju4667th/analyses/sandbox
stdout_file = OAR.bloblo.7157387.stdout
stderr_file = OAR.bloblo.7157387.stderr
jobType = PASSIVE
properties = ((desktop_computing = 'NO') AND (team='epimed' or team='ciment' or team='visu')) AND visu = 'NO'
reservation = None
walltime = 0:1:10
submissionTime = 2018-05-21 18:31:24
startTime = 2018-05-21 18:31:36
stopTime = 2018-05-21 18:31:47
cpuset_name = ju4667th_7157387
initial_request = oarsub --resource /nodes=1/core=1,walltime=00:01:10 --directory /home/ju4667th/analyses/sandbox --name bloblo --project epimed bash test.sh
message = R=1,W=0:1:10,J=B,N=bloblo,P=epimed (Karma=0.001,quota_ok)
scheduledStart = no prediction
resubmit_job_id = 0
events =
2018-05-21 18:31:48> SWITCH_INTO_TERMINATE_STATE:[bipbip 7157387] Ask to change the job state
Please review the GitHub markdown guide how to quote code.
Try to format it as below:
Command X:
oursub --xx --yy
Output:
foo bar
thanks, another try:
First we write the command to be submitted to oarsub or wrap it up in a shell script:
nano test.sh
echo blabla
sleep 10
echo blibli
then submit the job to oar with parameters:
oarsub --resource /nodes=1/core=1,walltime=00:01:10 --directory `pwd` --name bloblo --project epimed "bash test.sh"
oar echoes in the shell:
[ADMISSION RULE] Modify resource description with type constraints
[PROJECT] Adding project constraints: (team='epimed' or team='ciment' or team='visu')
OAR_JOB_ID=7157387
jobs generate 2 files: stderr and stdout
-rw-r--r-- 1 ju4667th l-iab 0 May 21 18:31 OAR.bloblo.7157387.stderr
-rw-r--r-- 1 ju4667th l-iab 14 May 21 18:31 OAR.bloblo.7157387.stdout
output
more OAR.bloblo.7157387.stdout
blabla
blibli
The oarstat command to show a job:
oarstat -j 7157387
Job id Name User Submission Date S Queue
---------- -------------- -------------- ------------------- - ----------
7157387 bloblo ju4667th 2018-05-21 18:31:24 T default
oarstat -fj command allows full view of a job
oarstat -fj 7157387
output
Job_Id: 7157387
job_array_id = 7157387
job_array_index = 1
name = bloblo
project = epimed
owner = ju4667th
state = Terminated
wanted_resources = -l "{type = 'default'}/network_address=1/core=1,walltime=0:1:10"
types =
dependencies =
assigned_resources = 805
assigned_hostnames = luke41
queue = default
command = bash test.sh
exit_code = 0 (0,0,0)
launchingDirectory = /home/ju4667th/analyses/sandbox
stdout_file = OAR.bloblo.7157387.stdout
stderr_file = OAR.bloblo.7157387.stderr
jobType = PASSIVE
properties = ((desktop_computing = 'NO') AND (team='epimed' or team='ciment' or team='visu')) AND visu = 'NO'
reservation = None
walltime = 0:1:10
submissionTime = 2018-05-21 18:31:24
startTime = 2018-05-21 18:31:36
stopTime = 2018-05-21 18:31:47
cpuset_name = ju4667th_7157387
initial_request = oarsub --resource /nodes=1/core=1,walltime=00:01:10 --directory /home/ju4667th/analyses/sandbox --name bloblo --project epimed bash test.sh
message = R=1,W=0:1:10,J=B,N=bloblo,P=epimed (Karma=0.001,quota_ok)
scheduledStart = no prediction
resubmit_job_id = 0
events =
2018-05-21 18:31:48> SWITCH_INTO_TERMINATE_STATE:[bipbip 7157387] Ask to change the job state
Great, much better!
How to kill a job? Also is it not possible to define the job submit directives in the script header as, for example, with PBS (shown below)?
#!/bin/bash
#PBS -A <account_no> (only for account based usernames)
#PBS -l walltime=1:00:00
#PBS -l select=1:ncpus=1
#
./my_application
Here is the oardel command:
$oarsub --resource /nodes=1/core=1,walltime=00:01:10 --directory `pwd` --name bloblo --project epimed "bash test.sh"
[ADMISSION RULE] Modify resource description with type constraints
[PROJECT] Adding project constraints: (team='epimed' or team='ciment' or team='visu')
OAR_JOB_ID=7159343
$ oardel 7159343
Deleting the job = 7159343 ...REGISTERED.
The job(s) [ 7159343 ] will be deleted in a near future.
yes, it is possible to include directives in the header with the -S option :
nano test2.sh
#! /bin/bash
#OAR -n bloblo
#OAR -l nodes=1,core=1,walltime=00:01:00
#OAR --project epimed
echo blabla
sleep 10
echo blibli
launched as
oarsub -S "./test2.sh"
Hi, you can see here our first try: https://github.com/nextflow-io/nextflow/compare/master...bzizou:OAR_executor
The problem that we have now, is that OAR absolutely needs the batch script submitted to be executable (+x mode). We haven't found a solution for doing this change mode efficiently before the submission.
Very good. My suggestion is that the OAR specifies the jobs requirements putting them as meta directives in the script header.
The nextflow executor mechanism needs to create two files for each job: .command.sh
is the command task as provided by the user in the process definition, .command.run
is the launcher script that will manage the execution with OAR batch scheduler.
To implement the support for OAR follow the steps:
nextflow.executor.OarExecutor
that extends AbstractGridExecutor.oar
. OarExecutor
methods using the appropriate OAR commands and directives. You can use the SgeExecutor as a reference. Following these steps, the implementation should be straightforward having a basic knowledge of groovy or java. I'm happy to help or discuss further any problem and detail while the implementation progress.
you can see here our first try
Nice! please open a pull request, so I can comment in the code.
The problem that we have now, is that OAR absolutely needs the batch script submitted to be executable (+x mode).
In the getSubmitCommandLine
you can change the script permissions as shown below:
scriptFile.setPermissions(7,0,0)
Hello Paolo,
With @bzizou we added the setPermissions
and modified the parseJobId
fonction to correctly work with OAR.
There is no error left within the .nextflow.log
file, but in the corresponding .command.log
file there is :
/bin/bash: .command.run: command not found
I tried to hardcode a ./
in front of .command.run
to make it "executable" but same result.
Here is the output of an OAR nextflow process output :
oarstat -fj 7224138
Job_Id: 7224138
job_array_id = 7224138
job_array_index = 1
name = nf-fastq2sorted
project = epimed
owner = tintest
state = Terminated
wanted_resources = -l "{type = 'default'}/resource_id=1,walltime=2:0:0"
types =
dependencies =
assigned_resources = 737
assigned_hostnames = luke37
queue = default
command = .command.run
exit_code = 32512 (127,0,0)
launchingDirectory = /home/tintest/PROJECTS/Test_nextflow_OAR
stdout_file = /home/tintest/PROJECTS/Test_nextflow_OAR/work/1a/5b6f49cf53fb1f0e866f68cfb4e5ea/.command.log
stderr_file = /home/tintest/PROJECTS/Test_nextflow_OAR/work/1a/5b6f49cf53fb1f0e866f68cfb4e5ea/.command.log
jobType = PASSIVE
properties = ((desktop_computing = 'NO') AND (team='epimed' or team='ciment' or team='visu')) AND visu = 'NO'
reservation = None
walltime = 2:0:0
submissionTime = 2018-05-23 13:51:03
startTime = 2018-05-23 13:51:11
stopTime = 2018-05-23 13:51:12
cpuset_name = tintest_7224138
initial_request = oarsub -S -n nf-fastq2sorted .command.run; #OAR -n nf-fastq2sorte; #OAR -O /home/tintest/PROJECTS/Test_nextflow_OAR/work/1a/5b6f49cf53fb1f0e866f68cfb4e5ea/.command.lo; #OAR -E /home/tintest/PROJECTS/Test_nextflow_OAR/work/1a/5b6f49cf53fb1f0e866f68cfb4e5ea/.command.lo; #OAR -q defaul; #OAR --project epime
message = R=1,W=2:0:0,J=B,N=nf-fastq2sorted,P=epimed (Karma=0.000,quota_ok)
scheduledStart = no prediction
resubmit_job_id = 0
events =
2018-05-23 13:51:12> SWITCH_INTO_TERMINATE_STATE:[bipbip 7224138] Ask to change the job state
Thank you.
I guess this is the problem
launchingDirectory = /home/tintest/PROJECTS/Test_nextflow_OAR
It should be
/home/tintest/PROJECTS/Test_nextflow_OAR/work/1a/5b6f49cf53fb1f0e866f68cfb4e5ea/
Have you specified the work --directory
in the directives ?
No I did not have specified an hardcoded --directory
.
It's weird because OAR guess the .command.log
right, it should be also right for the launchingDirectory
, isn't it ?
In the OAR manual :
-d, --directory=<dir> Specify the directory where to launch the command (default is current directory)
So we have to specify it, where should I do it in the executor code ?
As shown here.
So I did specify the -d
option but, still the same error, it does not find .command.run
, which is clearly weird.
So i decided to try to force it by specifying the workDir in the oarsub
by doing :
[ 'oarsub', '-S', '-n', getJobNameFor(task), task.workDir + '/' + scriptFile.getName() ]
Which is dirty ... but, I don't know why, the slash is just completely ignored, I got the following error code : oarsub -S -n nf-fastq2sorted /home/tintest/PROJECTS/Test_nextflow_OAR/work/95/4e71c80e755f57a5922b66cb220c96.command.run
Any idea for a cleaner solution ?
Thank you.
This path looks broken /home/tintest/PROJECTS/Test_nextflow_OAR/work/95/4e71c80e755f57a5922b66cb220c96.command.run
there should be a /
before .command.run
.
Also the -d
should work. Make sure it's included in the .command.run
header. Take in consideration you debug that script just changing in the task work dir and submit the job, ie.
cd /home/tintest/PROJECTS/Test_nextflow_OAR/work/95/4e71c80e755f57a5922b66cb220c96
oarsub -S .command.run
Hello,
I know the path looks broken, I may have been poorly expressed myself, I tried to add a /
to the path within the getSubmitCommandLine
function, by doing this [ 'oarsub', '-S', '-n', getJobNameFor(task), task.workDir + '/' + scriptFile.getName() ]
but it's like the /
is ignored :
oarsub -S -n nf-fastq2sorted /home/tintest/PROJECTS/Test_nextflow_OAR/work/95/4e71c80e755f57a5922b66cb220c96.command.run
Well I removed the /
and here is the header of a .command.run
with the -d
option set up :
#OAR -n nf-fastq2sorted
#OAR -O /home/tintest/PROJECTS/Test_nextflow_OAR/work/2d/579ec983a80bee7f4b61067bf55044/.command.log
#OAR -E /home/tintest/PROJECTS/Test_nextflow_OAR/work/2d/579ec983a80bee7f4b61067bf55044/.command.log
#OAR -d /home/tintest/PROJECTS/Test_nextflow_OAR/work/2d/579ec983a80bee7f4b61067bf55044
#OAR -q default
#OAR --project epimed
cd /home/tintest/PROJECTS/Test_nextflow_OAR/work/2d/579ec983a80bee7f4b61067bf55044
# NEXTFLOW TASK: fastq2sortedbam (1)
Everything looks fine to me, but still : /bin/bash: .command.run: command not found
Thank you.
Have you tried to run the job just using the command oarsub -S .command.run
from a shell terminal ?
I just did it. oarsub -S .command.run
seems to do no work, but oarsub -S ./.command.run
seems to work (the job is waiting).
Oh, so it looks a OAR issue. Anyhow if the latter works, the following should work as well:
List<String> getSubmitCommandLine(TaskRun task, Path scriptFile ) {
return [ 'oarsub', '-S', "./${scriptFile.getName()}" ]
}
(the name is specified with a directive in the script, therefore shouldn't be needed)
OK ! Seems to work, now I got some error relative to my poor adpation of my code on this new cluster.
Thank you a thousand times ! I cannot tell you right now it's working but it's definitely a great improvement. I'll tell you if I got some other problems or if everything is working in a few days I will do a pull request 👍
Thank you again.
Any progress on this?
I'm busy on a side project since the beginning of the week. I should go back on OAR next week :)
Great! No hurry, just curious about the status of this.
Hello,
I'm back on OAR.
So the job is correctly scheduled, but nextflow is sending the OAR option as only one string, with the following syntaxe in my nextflow config file :
process {
executor='oar'
queue='default'
clusterOptions = '--project epimed' -l /core=16,walltime=00:30:00'
}
But OAR is expecting several string for each parameter. Could you tell me how to fix that ?
Thank you.
Push your code and link it here, please.
The clusterOptions
is added here.
As you can see it's added as it specified by user. What you need to do is to split that string in tokens and add each of them to the result
list.
To split that string use the splitter helper that takes care to keep together value enclosed within quote characters (however you need to verify that this will be compatible with the syntax expected by OAR).
Another option is to provide clusterOptions
as a List
object, therefore you can just it to the result
object.
Here we are,
I finally had some time to finalize this work. I did use the .tokenize()
function with a semicollon separator because the OAR syntax is quite complicated and the semicollon seems to be a "banned" character.
I removed some of the preexistent options, everything will be specified by ClusterOptions, it's more convenient for me.
Tell me if it seems ok for you (https://github.com/Tintest/nextflow/tree/OAR_executor).
I never did a pull request, which branch should I choose ? Thank you for your help.
Nice! quite easy, push the latest changes then in your GitHub fork page you will find a big button "create a new pull request", as simple as that.
Yes, quite easy indeed, but I can only select the Bzizou's nextflow fork. Is that ok ? And then he will have to do a pull request as well ?
Oh, that's because you have forked another fork, not the main project.
However when you open the pull request there's a combo box, select base fork: nextflow-io/nextflow
, base: master
.
Done !
Well, not sure to who you have send it :)
There isn't in the NF repos: https://github.com/nextflow-io/nextflow/pulls
It looks you have created it in your own fork https://github.com/Tintest/nextflow/pull/1
Should be ok now ... It was more difficult than expected because my branch was behind and I'm a github newbie :)
As for #766, I'm closing this because it looks stalled.
Hello,
as discussed by mail I would like to be able to use Nextflow with the OAR batch scheduler.
We made some change on the OAR_executor branch @bzizou, without any sucess. I hope you will be able to help us.
Thank you.