Closed Luke-ebbis closed 5 months ago
So I found that this issue was similar to #52. After changing gres
for `slurm_extra-'--gres'. It seems to work:
## colabfold:
## Calculate the protein structures of the fasta files using Colabfold.
##
rule colabfold:
conda:
"envs/fold.yml"
input:
fasta="results/data/{protein_complex}/subunits/fasta-{grouping}/{fasta_file}.fasta",
cuda="results/checkpoints/setup_cuda"
resources:
mem_mb=32000,
slurm_extra= "'--gres=gpu:a100:1'",
constraint="gpu",
tasks="1",
cpus_per_task=10,
mem="30G",
slurm_partition='gpu1',
runtime='1440',
slurm_account="mpmp_gpu",
# threads: 10
params:
number_of_models=config['colabfold']['number_of_models']
output:
directory("results/data/{protein_complex}/subunits/processed-{grouping}/{fasta_file}")
shell:
"""
echo "Starting colabfold, please see {output}/log.txt for the log"
colabfold_batch {input.fasta} {output} --num-models {params.number_of_models}
"""
see
sacct -j 11248616
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
11248616 7c0303fa-+ gpu1 mpmp_gpu 20 RUNNING 0:0
11248616.ba+ batch mpmp_gpu 20 RUNNING 0:0
11248616.ex+ extern mpmp_gpu 20 RUNNING 0:0
Can this be more clear in the documentation (where can I make a PR to update the docs?)
Again thanks for providing this great software!
Erm, nice that it is working for you. Thank you for your offer - you can always fork and do a merge request, like with any other repository. Note, however, that sbatch
is no MPI-starter and that slurm_extra
is a documented feature (albeit, that some documentation changes are already pending and this needs a better description, as I concede).
One hint, if you allow me: Strive to build portable workflows, just put your resources
in a workflow profile. I would be really pleased to find a workflow for structure prediction in the Snakemake workflow catalogue.
I have been trying to run my snakemake on a HPC cluster. But I cannot figure out how I can run it in a non-interactive session with a gpu.
The rule I submitted was:
I think that this rule is akin to the jobscript that my HPC support gives as an example (see Example jobscript). When I remove the gres option, the rule is submitted as an interactive session.
The submission details are as follows
Does someone know why constraint and gres conflict?
Files
Example jobscript
Details
```sh #!/bin/bash -l # Provided by: # https://docs.mpcdf.mpg.de/doc/computing/raven-user-guide#batch-jobs-using-gpus # Standard output and error: #SBATCH -o ./job.out.%j #SBATCH -e ./job.err.%j # Initial working directory: #SBATCH -D ./ # Job name #SBATCH -J test_gpu # #SBATCH --ntasks=1 #SBATCH --constraint="gpu" # # --- default case: use a single GPU on a shared node --- #SBATCH --gres=gpu:a100:1 #SBATCH --cpus-per-task=18 #SBATCH --mem=125000 # # --- uncomment to use 2 GPUs on a shared node --- # #SBATCH --gres=gpu:a100:2 # #SBATCH --cpus-per-task=36 # #SBATCH --mem=250000 # # --- uncomment to use 4 GPUs on a full node --- # #SBATCH --gres=gpu:a100:4 # #SBATCH --cpus-per-task=72 # #SBATCH --mem=500000 # #SBATCH --mail-type=none #SBATCH --mail-user=userid@example.mpg.de #SBATCH --time=12:00:00 module purge module load intel/21.2.0 impi/2021.2 cuda/11.2 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun ./cuda_executable ```
Slurm configuration
Details
```yaml executor: slurm latency-wait: 60 jobname: "{rule}.{jobid}" jobs: 100 default-resources: - mem_mb=2000 - runtime='1440' - tasks=2 - mpi="sbatch" - slurm_partition="gpu1" - disk_mb=5000 ```