.sub file environment variables do not have a value

pepkit / looper

A job submitter for Portable Encapsulated Projects

http://looper.databio.org

BSD 2-Clause "Simplified" License

20 stars 7 forks source link

.sub file environment variables do not have a value #271

Closed aaron-gu closed 4 months ago

aaron-gu commented 4 years ago

I saw that divvy is being used to generate the .sub files for a looper job submission. However, I could not easily find anywhere in the vignettes describing how to set the environment variables such as {MEM} and {CORES}. It would be nice if these variables were set to a default value without any configuration, or if there was extra description in the vignettes on how to set them.

nsheff commented 4 years ago

how are you trying to use them? using divvy you set them with -c mem=8000 cores=1, for example

http://divvy.databio.org/en/latest/cli/

aaron-gu commented 4 years ago

I am just using looper run project_config.yaml

nsheff commented 4 years ago

ok -- looper should default to using the localhost template which doesn't have those variables... so, that doesn't make sense to me... can you be more specific about what you're trying to do? also, try the above

aaron-gu commented 4 years ago

I set up a PEP project for my bedshift code to generate the 100 samples for every parameter combination. I followed the PEP and looper tutorials pretty smoothly until it came to running the looper job, where I got the error sbatch: error: invalid memory constraint {MEM}

Also, I'm not sure how to run the divvy command with looper, since there are many .sub files generated.

aaron-gu commented 4 years ago

Here's an example of a .sub file:

#!/bin/bash
#SBATCH --job-name='bedshift_run_add1'
#SBATCH --output='looper_output/submission/bedshift_run_add1.log'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="/sfs/qumulo/qhome/ag5ym/databio/bedshift_paper/pep_project/bedshift.sh /project/shefflab/resources/regions/LOLACore/hg19/encode_tfbs/regions/wgEncodeAwgTfbsUwHek293CtcfUniPk.narrowPeak 0.1 0.0 0.0 100 "

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"

nsheff commented 4 years ago

ah, I see. you're on rivanna -- so we set the looper default to submit jobs to slurm.

there's lots of things you can do.

try using looper --package to run using a local template, to test. divvy list shows available templates
if you want to use the slurm template, then of course you must provide all the variables for that template. you can do it like I mentioned above: looper run -c cores=1 mem=4000
really, you should provide in your pipeline interface these variables. you do this using the compute section. http://looper.databio.org/en/latest/pipeline-interface-specification/#compute

you could just add this to your interface:

compute:
  mem: 4000
  cores: 1

aaron-gu commented 4 years ago

Got it, thanks! Is there a way to make it easier to find that section of documentation? The order I went through the docs was Introduction > Defining a Project > Running on a cluster, and then I followed the links to divvy to try to solve the issue.

donaldcampbelljr commented 4 months ago

I added a bit of clarification on this to the docs for the upcoming release.

donaldcampbelljr commented 4 months ago

Solved with v1.8.1 Release.