HPC mode is not working
Executing Slurm script with configurations for HPC mode.
Crashing. Logs attached.
For the test, I changed the mode from HPC to Local. In this case, the same config file was working.
What could be the problem with using HPC mode + Slurm?
SLURM script
#!/bin/bash
#SBATCH --partition=Tucana
#SBATCH --job-name=config-test3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=500gb
#SBATCH --time=48:00:00
#SBATCH --output=%x_%j.out
echo "======================================================"
echo "Start Time : $(date)"
echo "Submit Dir : $SLURM_SUBMIT_DIR"
echo "Job ID/Name : $SLURM_JOBID / $SLURM_JOB_NAME"
echo "Num Tasks : $SLURM_NTASKS total [$SLURM_NNODES nodes @ $SLURM_CPUS_ON_NODE CPUs/node]"
echo "Hostname : $HOSTNAME"
echo "======================================================"
echo ""
# Go to the working dir:
cd ${SLURM_SUBMIT_DIR}
# Load required modules:
module load anaconda3/2022.10
module list
# Get to the HADDOCK 3 dir:
HADDOCK_DIR="/users/aduchenk/software/haddock3/haddock3"
cd ${HADDOCK_DIR}
pwd
# Prepare HADDOCK env:
conda activate haddock3
wait
# Move to the examples dir and chose :
cd ${HADDOCK_DIR}/examples/docking-antibody-antigen
haddock3 docking-antibody-antigen-ranairCDR-clt-full-hpc.cfg
echo ""
echo "======================================================"
echo "End Time : $(date)"
[Non-working] Cfg file for HPC mode (docking-antibody-antigen-ranairCDR-clt-full-hpc.cfg)
# ====================================================================
# Protein-protein docking example with NMR-derived ambiguous interaction restraints
# directory in which the scoring will be done
run_dir = "run1-ranairCDR-cltsel-full"
# compute mode
mode = "hpc"
# batch system
batch_type = "slurm"
# queue name
queue = "short"
# in which queue the jobs should run, if nothing is defined
# it will take the system's default
# queue = "short"
# concatenate models inside each job, concat = 5 each .job will produce 5 models
concat = 5
# Limit the number of concurrent submissions to the queue
queue_limit = 4
molecules = [
"data/4G6K_fv.pdb",
"data/4I1B-matched.pdb"
ERROR in Log
[2023-12-07 14:04:25,819 libutil ERROR] list index out of range
Traceback (most recent call last):
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/libs/libutil.py", line 335, in log_error_and_exit
yield
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/clis/cli.py", line 185, in main
workflow.run()
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/libs/libworkflow.py", line 43, in run
step.execute()
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/libs/libworkflow.py", line 152, in execute
self.module.run() # type: ignore
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/modules/base_cns_module.py", line 61, in run
self._run()
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/modules/topology/topoaa/__init__.py", line 215, in _run
engine.run()
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/libs/libhpc.py", line 181, in run
worker.run()
File "/users/aduchenk/software/haddock3/haddock3/src/haddock/libs/libhpc.py", line 103, in run
self.job_id = int(p.stdout.decode("utf-8").split()[-1])
IndexError: list index out of range
[2023-12-07 14:04:25,821 libutil ERROR] list index out of range
For tests: I tried the same config file and just changed HPC mode for Local:
[Working] Cfg file for Local mode
# ====================================================================
# Protein-protein docking example with NMR-derived ambiguous interaction restraints
# directory in which the scoring will be done
run_dir = "run1-ranairCDR-cltsel-full"
# compute mode
mode = "local"
# batch system
batch_type = "slurm"
# queue name
queue = "short"
# in which queue the jobs should run, if nothing is defined
# it will take the system's default
# queue = "short"
# concatenate models inside each job, concat = 5 each .job will produce 5 models
concat = 5
# Limit the number of concurrent submissions to the queue
queue_limit = 250"
[Working] Log file
[2023-12-07 13:29:51,776 __init__ INFO] [topoaa] Running CNS Jobs n=2
[2023-12-07 13:29:51,776 libutil INFO] Selected 2 cores to process 2 jobs, with 64 maximum available cores.
[2023-12-07 13:29:51,776 libparallel INFO] Using 2 cores
[2023-12-07 13:29:55,459 libparallel INFO] >> /4G6K_fv.inp completed 50%
[2023-12-07 13:29:55,459 libparallel INFO] >> /4I1B-matched.inp completed 100%
[2023-12-07 13:29:55,459 libparallel INFO] 2 tasks finished
[2023-12-07 13:29:55,459 __init__ INFO] [topoaa] CNS jobs have finished
[2023-12-07 13:29:55,473 base_cns_module INFO] Module [topoaa] finished.
[2023-12-07 13:29:55,473 __init__ INFO] [topoaa] took 4 seconds
[2023-12-07 13:29:56,500 base_cns_module INFO] Running [rigidbody] module
[2023-12-07 13:29:56,502 __init__ INFO] [rigidbody] crossdock=true
[2023-12-07 13:29:56,502 __init__ INFO] [rigidbody] Preparing jobs...
[2023-12-07 13:30:41,902 __init__ INFO] [rigidbody] Running CNS Jobs n=10000
[2023-12-07 13:30:41,903 libutil INFO] Selected 8 cores to process 10000 jobs, with 64 maximum available cores.
[2023-12-07 13:30:41,938 libparallel INFO] Using 8 cores"
HPC mode is not working Executing Slurm script with configurations for HPC mode. Crashing. Logs attached. For the test, I changed the mode from HPC to Local. In this case, the same config file was working. What could be the problem with using HPC mode + Slurm?
SLURM script
[Non-working] Cfg file for HPC mode (docking-antibody-antigen-ranairCDR-clt-full-hpc.cfg)
ERROR in Log
For tests: I tried the same config file and just changed HPC mode for Local:
[Working] Cfg file for Local mode
[Working] Log file