prehensilecode / alphafold_singularity

Singularity recipe for AlphaFold
GNU General Public License v3.0
32 stars 12 forks source link

run_singularity.py does not terminate even if >1 GPU is requested #26

Closed prehensilecode closed 1 year ago

prehensilecode commented 1 year ago

run_singularity.py does this:

# Check Slurm environment if available
if os.environ['SLURM_GPUS_ON_NODE']:
    ngpus_requested = int(os.environ['SLURM_GPUS_ON_NODE'])
    if ngpus_requested > 1:
        logging.fatal(f'No. of GPUs requested is > 1: {ngpus_requested}')

But, all that happens is a CRITICAL message and the script continues executing:

SLURM_GPUS_ON_NODE=4
SLURM_JOB_GPUS=0,1,2,3
SLURM_STEP_GPUS=
ALPHAFOLD_DIR=/ifs/opt/alphafold/2.3.1
ALPHAFOLD_DATADIR=/beegfs/AlphaFoldDatabases-2-3-1
CRITICAL:absl:No. of GPUs requested is > 1: 4
I0228 09:32:23.923020 23456247859008 run_singularity.py:137] Binding /ifs/sysadmin/Testing/AlphaFold -> /mnt/fasta_path_0
I0228 09:32:23.923094 23456247859008 run_singularity.py:137] Binding /beegfs/AlphaFoldDatabases-2-3-1/uniref90 -> /mnt/uniref90_database_path
...
prehensilecode commented 1 year ago

The absl.loggin.fatal() call must be inside the app context.

Fixed by https://github.com/prehensilecode/alphafold_singularity/commit/d64e4697c855f0e2be1f146a3dcf5cca23b4ca9f

BUT will still try to get this working with multiple GPUs.

prehensilecode commented 1 year ago

Works with 2 and 4 GPUs.