Preferred syntax for one-command-per-line job arrays

pjvandehaar commented 6 years ago

Option 1: (used by https://github.com/statgen/SLURM-examples/blob/master/job-array-one-command-per-line)

#!/bin/bash
#SBATCH --array=1-2
declare -a commands
commands[1]="Rscript myscript.R input_file_A.txt"
commands[2]="Rscript myscript.R input_file_B.txt"
bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}"

Option 2:

#!/bin/bash
#SBATCH --array=1-2
commands=(
"Rscript myscript.R input_file_A.txt"
"Rscript myscript.R input_file_B.txt"
)
bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}"

Option 3:

#!/bin/bash
#SBATCH --array=1-2
read -d '' commands <<'EOF'
Rscript myscript.R input_file_A.txt
Rscript myscript.R input_file_B.txt
EOF
echo "$commands" | sed -n ${SLURM_ARRAY_TASK_ID}p | bash

Option 4: somewhere there's a script (maybe written by Terry?) that does this.

dtaliun commented 6 years ago

I think option 1 is better, because:

It is more easy to see which commands failed. E.g. imagine job5 failed. With grep -F "[5]" command you can quickly jump into the executed command line. This is not so straightforward with options 2 and 3.
slrum log files are named using job number within job array. Given this. if you see slurm-123845_5.out, then i know that this is output from grep -F "[5]" command.

dtaliun commented 6 years ago

I would change bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}" to eval ${commands[${SLURM_ARRAY_TASK_ID}]}

dtaliun commented 6 years ago

slurm scripts are limited to 4Mb. If thousand of jobs a listed in such manner and the command lines are long, then i recommend to save symbols. E.g. instead of commands just use c or j or any single letter. Make R script (or python) executable, to avoid repeating Rscript. Ant etc.

abecasis commented 6 years ago

Is there a way to generalize the syntax so it is possible to eventually replace SLURM with another scheduler?

Goncalo

On Wed, Jul 12, 2017 at 12:57 PM, dtaliun notifications@github.com wrote:

slurm scripts are limited to 4Mb. If thousand of jobs a listed in such manner and the command lines are long, then i recommend to save symbols. E.g. instead of commands just use c or j or any single letter. Make R script (or python) executable, to avoid repeating Rscript. Ant etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314831642, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUP9RN3YY7MBrbnIi2ml0EzNRW6Fqks5sNPr2gaJpZM4OV5mM .

schelcj commented 6 years ago

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1)

This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

* i've seen odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

dtaliun commented 6 years ago

Yes, this syntax is also ok for PBS on flux (only names of env. variables are different).

schelcj commented 6 years ago

Also, using head/tail or sed makes it not bash specific.

abecasis commented 6 years ago

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller notifications@github.com wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

pjvandehaar commented 6 years ago

So,

option 5: run sarray cmds.txt, where sarray is something like:
```
#!/bin/bash
cmd_file="$1"
num_jobs="$(cat $cmd_file | grep -v "^#" | grep . | wc -l)" # ignore comments and blank lines
sbatch_args="$(echo $(cat "$this_file" | grep -oP '^#SBATCH \K.*'))" # collect args from #SBATCH lines
sbatch_cmd='eval "$(cat '"$cmd_file"' | grep -v "^#" | grep . | head -n $SLURM_ARRAY_TASK_ID | tail -n1)"'
sbatch $(sbatch_args) --array=1-$num_jobs --wrap="$sbatch_cmd"
```
(probably placed in /net/mario/cluster/bin/) and cmds.txt is like:
```
python3 -c 'print(chr(9) == "\t")'
Rscript a.R
Rscript b.R
#SBATCH --mem=1024
```
features that would be needed:
- extract & print job id from sbatch output. if sbatch output looks interesting, print it verbatim.
- allow users to run sarray --get 7 cmds.txt to extract the command for task 7.
  - same as cat cmds.txt | grep -v '^#' | grep . | sed -n 7p
- allow users to submit their own --array=4,7,8 to run a subset that failed previously.
  - or just run sarray --get 4,7,8 cmds.txt > cmds2.txt && sarray cmds2.txt
- allow users to specify the number of concurrent jobs, like --array=1-100%5 does.
- maybe put stdout/stderr in ~/tmp/sarray-output/<job_id>/<task_id> and print that path.

schelcj commented 6 years ago

In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.

On Jul 12, 2017, 3:19 PM -0400, abecasis notifications@github.com, wrote:

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller notifications@github.com wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

abecasis commented 6 years ago

Right - and those abstractions will be easier to write if the SLURM commands are within a small number of functions rather than peppered throughout code.

Goncalo

PS. Btw, are you around? We will need to do a lot of work in terms of a security policy for the TOPMed contract.

On Wed, Jul 12, 2017 at 3:28 PM, Chris Scheller notifications@github.com wrote:

In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.

On Jul 12, 2017, 3:19 PM -0400, abecasis notifications@github.com, wrote:

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller notifications@github.com wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314872335, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUGR0DkEGYkGUP_Wwqpn9_KUxLXsKks5sNR57gaJpZM4OV5mM .

schelcj commented 6 years ago

The abstraction should be above the batch script not within it. At the level of what generates the batch script. Their could be platforms that don't even need a batch script.

I'll be in tomorrow.

On Jul 12, 2017, 4:30 PM -0400, abecasis notifications@github.com, wrote:

Right - and those abstractions will be easier to write if the SLURM commands are within a small number of functions rather than peppered throughout code.

Goncalo

PS. Btw, are you around? We will need to do a lot of work in terms of a security policy for the TOPMed contract.

On Wed, Jul 12, 2017 at 3:28 PM, Chris Scheller notifications@github.com wrote:

In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.

On Jul 12, 2017, 3:19 PM -0400, abecasis notifications@github.com, wrote:

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller notifications@github.com wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314872335, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUGR0DkEGYkGUP_Wwqpn9_KUxLXsKks5sNR57gaJpZM4OV5mM .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

schelcj commented 6 years ago

Is sarray still around, I should remove those docs. I recommend keeping it simple with head/tail or sed and explain how to use different --array= values.

pjvandehaar commented 6 years ago

I cannot find the sarray command, though there are lots of docs still around. runslurm.pl may have replaced it.

schelcj commented 6 years ago

No, sarray was something I wrote for the biostat cluster before slurm supported array jobs. It really shouldn't be used anymore.

runslurm.pl is something @tpg wrote for the CSG cluster back when they were transitioning from Mosix to slurm, I think but maybe not.

On Jul 15, 2017, 10:43 AM -0400, Peter VandeHaar notifications@github.com, wrote:

I cannot find the sarray command, though there are lots of docs still around. runslurm.pl may have replaced it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ilarsf commented 6 years ago

Tried @schelcj 's solution with srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) and found that sbatch had troubles parsing the command lines. After changing it to bash -c "$(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1)" it worked just fine.

statgen / SLURM-examples

Preferred syntax for one-command-per-line job arrays #5