pepkit / divvy

Standardized computing resource configuration
http://divvy.databio.org
BSD 2-Clause "Simplified" License
4 stars 2 forks source link

divvy adapters #47

Closed nsheff closed 4 years ago

nsheff commented 4 years ago

adapters allow you to use divvy with any source of variables.

divvy originally was part of looper. therefore, the default divvy variables (like {CODE}, etc) are from looper. removing divvy from looper decoupled the software, but the variables are still tightly coupled. To make it more flexible, we need to remove this coupling. divvy adapters do that.

here's a config file with adapters:

adapters:
  code: looper.command
  logfile: looper.logfile
  jobname: looper.jobname
  cores: compute.cores
  time: compute.time
  mem: compute.mem
  docker_args: compute.docker_args
  docker_image: compute.docker_image
  singluarity_image: compute.singularity_image
  singularity_args: compute.singularity_args
compute_packages:
  default:
    submission_template: submit_templates/localhost_template.sub
    submission_command: sh
  local:
    submission_template: submit_templates/localhost_template.sub
    submission_command: sh
    adapters:
      custom: custom_adapter_here
  slurm:
    submission_template: submit_templates/slurm_template.sub
    submission_command: sbatch
  singularity:
    submission_template: submit_templates/localhost_singularity_template.sub
    submission_command: sh
    singularity_args: ""
  singularity_slurm:
    submission_template: submit_templates/slurm_singularity_template.sub
    submission_command: sbatch
    singularity_args: ""

adapters are simple variable mappings from one name to another. they can just be straight-up var:var mappings, but they can also include namespaces (on the supply side; divvy variables aren't namespaced).

This system would allow us to include a 'divvy-looper' adapter. this adapter could be modified either for a universal divvy config, or for a particular compute package, which would enable divvy templates to be used with multiple variable sources.

under this system, looper would simply provide to divvy all available namespaces, the same as it does for command templates. the adapter would convert these into the divvy variables. the advantages is now divvy templates are useful beyond looper. it also simplifies what looper has to do: nothing.

divvy should ship with looper adapters, something like the above example.

what do you think @stolarczyk ?

nsheff commented 4 years ago

In my testing of looper I'm missing how to use the new adapters on rivanna. I need an example.

nsheff commented 4 years ago

I put these adapters into my divvy config file:

adapters:
  code: looper.command
  jobname: looper.jobname
  cores: compute.cores
  logfile: compute.logfile
  time: compute.time
  mem: compute.memory
  docker_args: compute.docker_args
  docker_image: compute.docker_image
  singluarity_image: compute.singularity_image
  singularity_args: compute.singularity_args

It correctly populated the {CODE} variable, but not none of the others:

#!/bin/bash
#SBATCH --job-name='{JOBNAME}'
#SBATCH --output='{LOGFILE}'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="/home/ns5bc/code/sra_convert/sra_convert.py --srr /project/shefflab/data/sra/SRR8435075.sra /project/shefflab/data/sra/SRR8435076.sra /project/shefflab/data/sra/SRR8435077.sra /project/shefflab/data/sra/SRR8435078.sra -O /project/shefflab/processed/paqc/results_pipeline --verbosity 4 --logdev"

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"
stolarczyk commented 4 years ago

it's because it's looking for the exact keys in the template, uppercase

  CODE: looper.command
  LOGFILE: looper.log_file
  JOBNAME: looper.job_name
  CORES: compute.cores
  TIME: compute.time
  MEM: compute.mem
  DOCKER_ARGS: compute.docker_args
  DOCKER_IMAGE: compute.docker_image
  SINGULARITY_IMAGE: compute.singularity_image
  SINGULARITY_ARGS: compute.singularity_args
nsheff commented 4 years ago

got it!. code worked lowercase...

nsheff commented 4 years ago

great, those looper variables are working for me now. But the compute namespace is not working yet, is that expected?

nsheff commented 4 years ago

I've added an adapter version here: https://github.com/pepkit/divcfg/blob/master/uva_rivanna_adapters.yaml

will later integrate into the main config (should be backwards compatible)

stolarczyk commented 4 years ago

the compute namespace is not working yet, is that expected?

it works for me in looper, hmmm.. maybe we're doing sth differently? How are you testing it?

nsheff commented 4 years ago
DIVCFG=/project/shefflab/rivanna_config/divcfg/uva_rivanna_adapters.yaml looper run paqc.yaml --amendments sra_convert -d
cat /project/shefflab/processed/paqc/submission/convert_ATAC-seq_Suspension_rep3.sub
#!/bin/bash
#SBATCH --job-name='convert_ATAC-seq_Suspension_rep3'
#SBATCH --output='/project/shefflab/processed/paqc/submission/convert_ATAC-seq_Suspension_rep3.log'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="/home/ns5bc/code/sra_convert/sra_convert.py --srr /project/shefflab/data/sra/SRR8435075.sra /project/shefflab/data/sra/SRR8435076.sra /project/shefflab/data/sra/SRR8435077.sra /project/shefflab/data/sra/SRR8435078.sra -O /project/shefflab/processed/paqc/results_pipeline --logdev"

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"
nsheff commented 4 years ago

https://github.com/databio/paqc

stolarczyk commented 4 years ago

have you specified size_dependent_variablesas a TSV in the compute section of sra_convert piface?

stolarczyk commented 4 years ago

I didn't make it backwards compatible. Only the TSV way is supported now

stolarczyk commented 4 years ago

have you specified size_dependent_variablesas a TSV in the compute section of sra_convert piface?

worked for me this way:

[mjs5kd@udc-ba36-36 paqc](master): echo $DIVCFG
/project/shefflab/rivanna_config/divcfg/uva_rivanna_adapters.yaml
[mjs5kd@udc-ba36-36 paqc](master): looper run paqc.yaml --amendments sra_convert -d --limit 1
Command: run (Looper version: 0.12.6-dev)
Using amendments: sra_convert
Finding pipelines for protocol(s): *
Known protocols: *
## [1 of 17] GSM4289908 (*)
Writing script to /project/shefflab/processed/paqc/submission/convert_GSM4289908.sub
Job script (n=1; 0.00 Gb): /project/shefflab/processed/paqc/submission/convert_GSM4289908.sub
Dry run, not submitted

Looper finished
Samples valid for job generation: 1 of 1
Successful samples: 1 of 1
Commands submitted: 1 of 1
Jobs submitted: 1
Dry run. No jobs were actually submitted.
[mjs5kd@udc-ba36-36 paqc](master): c /project/shefflab/processed/paqc/submission/convert_GSM4289908.sub
#!/bin/bash
#SBATCH --job-name='convert_GSM4289908'
#SBATCH --output='/project/shefflab/processed/paqc/submission/convert_GSM4289908.log'
#SBATCH --mem='8000'
#SBATCH --cpus-per-task='1'
#SBATCH --time='00-04:00:00'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="sra_convert.py --srr /project/shefflab/data/sra/SRR10988638.sra "

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"
[mjs5kd@udc-ba36-36 paqc](master): c ${CODE}/sra_convert/pipeline_interface_convert.yaml
protocol_mapping:
  "*": convert

pipelines:
  convert:
    name: convert
    path: sra_convert.py
    # required_input_files: SRR_files
    arguments:
      "--srr": SRR_files
    command_template: >
      {pipeline.path} --srr {sample.SRR_files}
    compute:
      bulker_crate: databio/sra_convert
      size_dependent_variables: resources.tsv
[mjs5kd@udc-ba36-36 paqc](master): c ${CODE}/sra_convert/resources.tsv 
max_file_size   cores   mem time
NaN 1   8000    00-04:00:00
0.05    2   12000   00-08:00:00
0.5 4   16000   00-12:00:00
1   8   16000   00-24:00:00
10  16  32000   02-00:00:00
nsheff commented 4 years ago

perfect -- can you push those changes to sra_convert ?

I got mixed up between the adapter changes and the compute changes :)

nsheff commented 4 years ago

nm I got it. works! thanks.