piercelab / tcrmodel2

Apache License 2.0
28 stars 5 forks source link

Title: Slow Compile Times and Unexpected Hangup in AlphaFold TCRModel2 Pipeline #12

Closed hotwa closed 4 months ago

hotwa commented 9 months ago

Description

I am encountering performance issues and an unexpected hangup while running protein structure predictions using the AlphaFold TCRModel2. Specifically, the model compile times are exceptionally long, and the program unexpectedly terminates during execution.

Environment

singularity sif file (default version: cuda 11.2) Hardware Configuration: GPU T4, CPU 12 cores system: ubuntu 20.04

Steps to Reproduce

  1. Started the TCRModel2 pipeline with the following command:

nohup ./run_tcrmodel2_singularity.sh > run_tcrmodel2.log 2>&1 &

  1. Observed the following key log outputs:

Exceptionally long compile times for the module jit_apply_fn. Failure to initialize CUDA, displaying CUDA_ERROR_UNKNOWN. TensorFlow unable to find any available GPU/TPU devices. Program unexpectedly terminating (Hangup). Excerpt from Logs

era@era-forzengly-yfln:/mnt/mydrive/11/tcrmodel2/singularity$ cat run_tcrmodel2.log 
nohup: ignoring input
INFO:    Converting SIF file to temporary sandbox...
WARNING: underlay of /usr/bin/nvidia-smi required more than 50 (335) bind mounts
/opt/conda/lib/python3.10/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
  warnings.warn(
/opt/conda/lib/python3.10/site-packages/Bio/Data/SCOPData.py:18: BiopythonDeprecationWarning: The 'Bio.Data.SCOPData' module will be deprecated in a future release of Biopython in favor of 'Bio.Data.PDBData.
  warnings.warn(
2023-12-21 22:07:27.500098: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:root:Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
2023-12-21 22:07:29.123810: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
I1221 22:07:37.379446 139908727452608 templates.py:857] Using precomputed obsolete pdbs /mnt/mydrive/11/tcrmodel2/prepare/alphafold_db/pdb_mmcif/obsolete.dat.
I1221 22:07:40.921154 139908727452608 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
2023-12-21 22:07:40.933594: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
I1221 22:07:40.934333 139908727452608 xla_bridge.py:353] Unable to initialize backend 'cuda': FAILED_PRECONDITION: No visible GPU devices.
I1221 22:07:40.934664 139908727452608 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I1221 22:07:40.935444 139908727452608 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1221 22:07:40.935597 139908727452608 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
W1221 22:07:40.935809 139908727452608 xla_bridge.py:360] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I1221 22:08:02.038108 139908727452608 run_alphafold_tcrmodel2.3.py:649] Have 25 models: ['model_1_multimer_v3_pred_0', 'model_1_multimer_v3_pred_1', 'model_1_multimer_v3_pred_2', 'model_1_multimer_v3_pred_3', 'model_1_multimer_v3_pred_4', 'model_2_multimer_v3_pred_0', 'model_2_multimer_v3_pred_1', 'model_2_multimer_v3_pred_2', 'model_2_multimer_v3_pred_3', 'model_2_multimer_v3_pred_4', 'model_3_multimer_v3_pred_0', 'model_3_multimer_v3_pred_1', 'model_3_multimer_v3_pred_2', 'model_3_multimer_v3_pred_3', 'model_3_multimer_v3_pred_4', 'model_4_multimer_v3_pred_0', 'model_4_multimer_v3_pred_1', 'model_4_multimer_v3_pred_2', 'model_4_multimer_v3_pred_3', 'model_4_multimer_v3_pred_4', 'model_5_multimer_v3_pred_0', 'model_5_multimer_v3_pred_1', 'model_5_multimer_v3_pred_2', 'model_5_multimer_v3_pred_3', 'model_5_multimer_v3_pred_4']
I1221 22:08:02.039376 139908727452608 run_alphafold_tcrmodel2.3.py:666] Using random seed 2513294239402930 for the data pipeline
I1221 22:08:02.041227 139908727452608 run_alphafold_tcrmodel2.3.py:257] Predicting test_clsI_6kzw
I1221 22:08:02.101885 139908727452608 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain A: TCRa
I1221 22:08:02.102357 139908727452608 pipeline_custom_templates.py:251] input_sequence is AQEVTQIPAALSVPEGENLVLNCSFTDSAIYNLQWFRQDPGKGLTSLLLIQSSQREQTSGRLNASLDKSSGRSTLYIAASQPGDSATYLCAVTNQAGTALIFGKGTTLSVSS
I1221 22:08:02.122048 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpf53l0hj1/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp27z6mrqd.fasta /opt/tcrmodel2/data/databases/uniref90.tcrmhc.fasta"
I1221 22:08:02.287916 139908727452608 utils.py:36] Started Jackhmmer (uniref90.tcrmhc.fasta) query
I1221 22:08:06.427876 139908727452608 utils.py:40] Finished Jackhmmer (uniref90.tcrmhc.fasta) query in 4.138 seconds
I1221 22:08:06.824249 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpmx7t5x70/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp27z6mrqd.fasta /opt/tcrmodel2/data/databases/mgnify.fasta"
I1221 22:08:06.845409 139908727452608 utils.py:36] Started Jackhmmer (mgnify.fasta) query
I1221 22:08:06.894666 139908727452608 utils.py:40] Finished Jackhmmer (mgnify.fasta) query in 0.049 seconds
I1221 22:08:07.789217 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmp80wd0569/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp27z6mrqd.fasta /opt/tcrmodel2/data/databases/small_bfd.tcrmhc.fasta"
I1221 22:08:07.811394 139908727452608 utils.py:36] Started Jackhmmer (small_bfd.tcrmhc.fasta) query
I1221 22:08:08.469960 139908727452608 utils.py:40] Finished Jackhmmer (small_bfd.tcrmhc.fasta) query in 0.658 seconds
I1221 22:08:08.801025 139908727452608 pipeline_custom_templates.py:315] Uniref90 MSA size: 10000 sequences.
I1221 22:08:08.801297 139908727452608 pipeline_custom_templates.py:316] BFD MSA size: 318 sequences.
I1221 22:08:08.801355 139908727452608 pipeline_custom_templates.py:317] MGnify MSA size: 1 sequences.
I1221 22:08:08.802547 139908727452608 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 9951 sequences.
I1221 22:08:08.804389 139908727452608 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 1.
I1221 22:08:08.810806 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpnf22ofyv/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp27z6mrqd.fasta /opt/tcrmodel2/data/databases/uniprot.tcrmhc.fasta"
I1221 22:08:08.835481 139908727452608 utils.py:36] Started Jackhmmer (uniprot.tcrmhc.fasta) query
I1221 22:08:13.313538 139908727452608 utils.py:40] Finished Jackhmmer (uniprot.tcrmhc.fasta) query in 4.477 seconds
I1221 22:08:16.172923 139908727452608 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain B: TCRb
I1221 22:08:16.173460 139908727452608 pipeline_custom_templates.py:251] input_sequence is NAGVTQTPKFQVLKTGQSMTLQCSQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSIRGSRGEQFFGPGTRLTVL
I1221 22:08:16.173929 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmplq0d3qzg/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp2eyta8qs.fasta /opt/tcrmodel2/data/databases/uniref90.tcrmhc.fasta"
I1221 22:08:16.200209 139908727452608 utils.py:36] Started Jackhmmer (uniref90.tcrmhc.fasta) query
I1221 22:08:18.687906 139908727452608 utils.py:40] Finished Jackhmmer (uniref90.tcrmhc.fasta) query in 2.487 seconds
I1221 22:08:18.893774 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpp7p4u7pi/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp2eyta8qs.fasta /opt/tcrmodel2/data/databases/mgnify.fasta"
I1221 22:08:18.915514 139908727452608 utils.py:36] Started Jackhmmer (mgnify.fasta) query
I1221 22:08:18.980895 139908727452608 utils.py:40] Finished Jackhmmer (mgnify.fasta) query in 0.065 seconds
I1221 22:08:19.840876 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmp3cgp63qm/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp2eyta8qs.fasta /opt/tcrmodel2/data/databases/small_bfd.tcrmhc.fasta"
I1221 22:08:19.873059 139908727452608 utils.py:36] Started Jackhmmer (small_bfd.tcrmhc.fasta) query
I1221 22:08:20.277238 139908727452608 utils.py:40] Finished Jackhmmer (small_bfd.tcrmhc.fasta) query in 0.404 seconds
I1221 22:08:20.707345 139908727452608 pipeline_custom_templates.py:315] Uniref90 MSA size: 10000 sequences.
I1221 22:08:20.707688 139908727452608 pipeline_custom_templates.py:316] BFD MSA size: 219 sequences.
I1221 22:08:20.707754 139908727452608 pipeline_custom_templates.py:317] MGnify MSA size: 1 sequences.
I1221 22:08:20.708897 139908727452608 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 9958 sequences.
I1221 22:08:20.709067 139908727452608 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 1.
I1221 22:08:20.715440 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpkea2w870/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp2eyta8qs.fasta /opt/tcrmodel2/data/databases/uniprot.tcrmhc.fasta"
I1221 22:08:20.748240 139908727452608 utils.py:36] Started Jackhmmer (uniprot.tcrmhc.fasta) query
I1221 22:08:23.579051 139908727452608 utils.py:40] Finished Jackhmmer (uniprot.tcrmhc.fasta) query in 2.829 seconds
I1221 22:08:25.728175 139908727452608 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain C: Peptide
I1221 22:08:25.728458 139908727452608 pipeline_custom_templates.py:251] input_sequence is RLPAKAPLL
I1221 22:08:25.728700 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpc2hi3guc/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp10m96om2.fasta /opt/tcrmodel2/data/databases/uniref90.tcrmhc.fasta"
I1221 22:08:25.751008 139908727452608 utils.py:36] Started Jackhmmer (uniref90.tcrmhc.fasta) query
I1221 22:08:25.993752 139908727452608 utils.py:40] Finished Jackhmmer (uniref90.tcrmhc.fasta) query in 0.242 seconds
I1221 22:08:25.995324 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmps_xnbn5u/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp10m96om2.fasta /opt/tcrmodel2/data/databases/mgnify.fasta"
I1221 22:08:26.025289 139908727452608 utils.py:36] Started Jackhmmer (mgnify.fasta) query
I1221 22:08:26.082523 139908727452608 utils.py:40] Finished Jackhmmer (mgnify.fasta) query in 0.057 seconds
I1221 22:08:26.084461 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpuyejlagd/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp10m96om2.fasta /opt/tcrmodel2/data/databases/small_bfd.tcrmhc.fasta"
I1221 22:08:26.112579 139908727452608 utils.py:36] Started Jackhmmer (small_bfd.tcrmhc.fasta) query
I1221 22:08:26.153376 139908727452608 utils.py:40] Finished Jackhmmer (small_bfd.tcrmhc.fasta) query in 0.040 seconds
I1221 22:08:26.155044 139908727452608 pipeline_custom_templates.py:315] Uniref90 MSA size: 1 sequences.
I1221 22:08:26.155337 139908727452608 pipeline_custom_templates.py:316] BFD MSA size: 1 sequences.
I1221 22:08:26.155493 139908727452608 pipeline_custom_templates.py:317] MGnify MSA size: 1 sequences.
I1221 22:08:26.156236 139908727452608 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 1 sequences.
I1221 22:08:26.156452 139908727452608 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 1.
I1221 22:08:26.157336 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpu54jdzsq/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp10m96om2.fasta /opt/tcrmodel2/data/databases/uniprot.tcrmhc.fasta"
I1221 22:08:26.187705 139908727452608 utils.py:36] Started Jackhmmer (uniprot.tcrmhc.fasta) query
I1221 22:08:26.635069 139908727452608 utils.py:40] Finished Jackhmmer (uniprot.tcrmhc.fasta) query in 0.447 seconds
I1221 22:08:26.637317 139908727452608 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain D: MHCa
I1221 22:08:26.637658 139908727452608 pipeline_custom_templates.py:251] input_sequence is SHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDNDAASPRMVPRAPWMEQEGSEYWDRETRSARDTAQIFRVNLRTLRGYYNQSEAGSHTLQWMHGCELGPDGRFLRGYEQFAYDGKDYLTLNEDLRSWTAVDTAAQISEQKSNDASEAEHQRAYLEDTCVE
I1221 22:08:26.638065 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpu0vljxfi/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp5pe1w30i.fasta /opt/tcrmodel2/data/databases/uniref90.tcrmhc.fasta"
I1221 22:08:26.666882 139908727452608 utils.py:36] Started Jackhmmer (uniref90.tcrmhc.fasta) query
I1221 22:08:29.045351 139908727452608 utils.py:40] Finished Jackhmmer (uniref90.tcrmhc.fasta) query in 2.377 seconds
I1221 22:08:29.274103 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmp2bemt9xn/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp5pe1w30i.fasta /opt/tcrmodel2/data/databases/mgnify.fasta"
I1221 22:08:29.295174 139908727452608 utils.py:36] Started Jackhmmer (mgnify.fasta) query
I1221 22:08:29.424894 139908727452608 utils.py:40] Finished Jackhmmer (mgnify.fasta) query in 0.129 seconds
I1221 22:08:30.718371 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmp9vdyuyzx/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp5pe1w30i.fasta /opt/tcrmodel2/data/databases/small_bfd.tcrmhc.fasta"
I1221 22:08:30.748173 139908727452608 utils.py:36] Started Jackhmmer (small_bfd.tcrmhc.fasta) query
I1221 22:08:31.107003 139908727452608 utils.py:40] Finished Jackhmmer (small_bfd.tcrmhc.fasta) query in 0.358 seconds
I1221 22:08:31.722106 139908727452608 pipeline_custom_templates.py:315] Uniref90 MSA size: 10000 sequences.
I1221 22:08:31.722631 139908727452608 pipeline_custom_templates.py:316] BFD MSA size: 129 sequences.
I1221 22:08:31.722707 139908727452608 pipeline_custom_templates.py:317] MGnify MSA size: 1 sequences.
I1221 22:08:31.723972 139908727452608 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 8992 sequences.
I1221 22:08:31.724149 139908727452608 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 1.
I1221 22:08:31.733274 139908727452608 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmp2vsus3jy/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp5pe1w30i.fasta /opt/tcrmodel2/data/databases/uniprot.tcrmhc.fasta"
I1221 22:08:31.764023 139908727452608 utils.py:36] Started Jackhmmer (uniprot.tcrmhc.fasta) query
I1221 22:08:42.606158 139908727452608 utils.py:40] Finished Jackhmmer (uniprot.tcrmhc.fasta) query in 10.841 seconds
/opt/conda/lib/python3.10/site-packages/Bio/Data/SCOPData.py:18: BiopythonDeprecationWarning: The 'Bio.Data.SCOPData' module will be deprecated in a future release of Biopython in favor of 'Bio.Data.PDBData.
  warnings.warn(
2023-12-21 22:08:58.625898: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:root:Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
2023-12-21 22:09:00.275497: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
I1221 22:09:01.001135 140130015409088 templates.py:857] Using precomputed obsolete pdbs /mnt/mydrive/11/tcrmodel2/prepare/alphafold_db/pdb_mmcif/obsolete.dat.
I1221 22:09:01.747588 140130015409088 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
2023-12-21 22:09:01.753994: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
I1221 22:09:01.754484 140130015409088 xla_bridge.py:353] Unable to initialize backend 'cuda': FAILED_PRECONDITION: No visible GPU devices.
I1221 22:09:01.754760 140130015409088 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I1221 22:09:01.755181 140130015409088 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1221 22:09:01.755288 140130015409088 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
W1221 22:09:01.755374 140130015409088 xla_bridge.py:360] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I1221 22:09:10.139897 140130015409088 run_alphafold_tcrmodel2.3.py:649] Have 5 models: ['model_1_multimer_v3_pred_0', 'model_2_multimer_v3_pred_0', 'model_3_multimer_v3_pred_0', 'model_4_multimer_v3_pred_0', 'model_5_multimer_v3_pred_0']
I1221 22:09:10.140918 140130015409088 run_alphafold_tcrmodel2.3.py:666] Using random seed 520655745173572164 for the data pipeline
I1221 22:09:10.142350 140130015409088 run_alphafold_tcrmodel2.3.py:257] Predicting test_clsI_6kzw_pmhc_oc
I1221 22:09:10.144767 140130015409088 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain A: TCRa
I1221 22:09:10.144991 140130015409088 pipeline_custom_templates.py:251] input_sequence is AQEVTQIPAALSVPEGENLVLNCSFTDSAIYNLQWFRQDPGKGLTSLLLIQSSQREQTSGRLNASLDKSSGRSTLYIAASQPGDSATYLCAVTNQAGTALIFGKGTTLSVSS
I1221 22:09:10.145443 140130015409088 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpk469i7a1/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmpuchwfeys.fasta /opt/tcrmodel2/data/databases/uniref90.tcrmhc.fasta"
I1221 22:09:10.175758 140130015409088 utils.py:36] Started Jackhmmer (uniref90.tcrmhc.fasta) query
I1221 22:09:13.943617 140130015409088 utils.py:40] Finished Jackhmmer (uniref90.tcrmhc.fasta) query in 3.766 seconds
I1221 22:09:15.618740 140130015409088 pipeline_custom_templates.py:226] Uniref90 MSA size: 10000 sequences. This is for templates, not MSA construction)
I1221 22:09:16.837728 140130015409088 hmmbuild.py:121] Launching subprocess ['/opt/conda/bin/hmmbuild', '--hand', '--amino', '/tmp/tmpo0htiisg/output.hmm', '/tmp/tmpo0htiisg/query.msa']
I1221 22:09:16.885152 140130015409088 utils.py:36] Started hmmbuild query
I1221 22:09:17.139352 140130015409088 hmmbuild.py:128] hmmbuild stdout:
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             /tmp/tmpo0htiisg/query.msa
# output HMM file:                  /tmp/tmpo0htiisg/output.hmm
# input alignment is asserted as:   protein
# model architecture construction:  hand-specified by RF annotation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     query                    1   578   112     0.42  0.593 

# CPU time: 0.20u 0.00s 00:00:00.20 Elapsed: 00:00:00.20

stderr:

I1221 22:09:17.139782 140130015409088 utils.py:40] Finished hmmbuild query in 0.254 seconds
I1221 22:09:17.141631 140130015409088 hmmsearch.py:103] Launching sub-process ['/opt/conda/bin/hmmsearch', '--noali', '--cpu', '8', '--F1', '0.1', '--F2', '0.1', '--F3', '0.1', '--incE', '100', '-E', '100', '--domE', '100', '--incdomE', '100', '-A', '/tmp/tmpbh0xiwlm/output.sto', '/tmp/tmpbh0xiwlm/query.hmm', '/opt/tcrmodel2/data/databases/pdb_seqres.txt']
I1221 22:09:17.161583 140130015409088 utils.py:36] Started hmmsearch (pdb_seqres.txt) query
I1221 22:09:25.898700 140130015409088 utils.py:40] Finished hmmsearch (pdb_seqres.txt) query in 8.736 seconds
I1221 22:09:28.116724 140130015409088 templates.py:940] Searching for template for: AQEVTQIPAALSVPEGENLVLNCSFTDSAIYNLQWFRQDPGKGLTSLLLIQSSQREQTSGRLNASLDKSSGRSTLYIAASQPGDSATYLCAVTNQAGTALIFGKGTTLSVSS
I1221 22:09:29.140182 140130015409088 templates.py:267] Found an exact template match 3vxq_A.
I1221 22:09:29.154926 140130015409088 templates.py:267] Found an exact template match 3vxq_D.
I1221 22:09:29.617204 140130015409088 templates.py:267] Found an exact template match 3vxr_D.
I1221 22:09:30.307771 140130015409088 templates.py:267] Found an exact template match 3vxs_D.
I1221 22:09:31.191043 140130015409088 templates.py:267] Found an exact template match 5eu6_D.
I1221 22:09:31.680226 140130015409088 templates.py:267] Found an exact template match 4ww1_A.
I1221 22:09:32.262928 140130015409088 templates.py:267] Found an exact template match 4ww2_A.
I1221 22:09:32.980961 140130015409088 templates.py:267] Found an exact template match 6px6_D.
I1221 22:09:33.581995 140130015409088 templates.py:267] Found an exact template match 6eh5_A.
I1221 22:09:34.590054 140130015409088 templates.py:267] Found an exact template match 6fr3_A.
I1221 22:09:35.068050 140130015409088 templates.py:267] Found an exact template match 6fr4_A.
I1221 22:09:35.596770 140130015409088 templates.py:267] Found an exact template match 6fr5_A.
I1221 22:09:36.128025 140130015409088 templates.py:267] Found an exact template match 6eh4_D.
I1221 22:09:36.391314 140130015409088 templates.py:267] Found an exact template match 2bnu_A.
I1221 22:09:36.872898 140130015409088 templates.py:267] Found an exact template match 2bnq_D.
I1221 22:09:37.493764 140130015409088 templates.py:267] Found an exact template match 2bnr_D.
I1221 22:09:38.804049 140130015409088 templates.py:267] Found an exact template match 6q3s_D.
I1221 22:09:39.363675 140130015409088 templates.py:267] Found an exact template match 2pye_D.
I1221 22:09:39.595768 140130015409088 templates.py:267] Found an exact template match 2pyf_A.
I1221 22:09:40.457416 140130015409088 templates.py:267] Found an exact template match 5brz_D.
I1221 22:09:41.151616 140130015409088 templates.py:267] Found an exact template match 5bs0_D.
I1221 22:09:41.623939 140130015409088 templates.py:267] Found an exact template match 2f53_D.
I1221 22:09:43.167946 140130015409088 templates.py:267] Found an exact template match 2f54_D.
I1221 22:09:43.177637 140130015409088 templates.py:267] Found an exact template match 2f54_K.
I1221 22:09:43.605382 140130015409088 templates.py:267] Found an exact template match 2p5e_D.
I1221 22:09:44.007126 140130015409088 templates.py:267] Found an exact template match 2p5w_D.
I1221 22:09:44.488784 140130015409088 templates.py:267] Found an exact template match 2p1y_A.
I1221 22:09:44.498771 140130015409088 templates.py:267] Found an exact template match 2p1y_A.
I1221 22:09:44.508666 140130015409088 templates.py:267] Found an exact template match 2p1y_C.
I1221 22:09:44.519163 140130015409088 templates.py:267] Found an exact template match 2p1y_C.
I1221 22:09:44.550393 140130015409088 templates.py:267] Found an exact template match 2p1y_E.
I1221 22:09:44.564932 140130015409088 templates.py:267] Found an exact template match 2p1y_E.
I1221 22:09:44.574763 140130015409088 templates.py:267] Found an exact template match 2p1y_G.
I1221 22:09:44.588398 140130015409088 templates.py:267] Found an exact template match 2p1y_G.
I1221 22:09:48.774513 140130015409088 templates.py:267] Found an exact template match 1bwm_A.
I1221 22:09:48.790600 140130015409088 templates.py:267] Found an exact template match 1bwm_A.
I1221 22:09:50.070279 140130015409088 templates.py:267] Found an exact template match 6dfx_G.
I1221 22:09:50.082986 140130015409088 templates.py:267] Found an exact template match 6dfx_I.
I1221 22:09:50.523041 140130015409088 templates.py:267] Found an exact template match 6ovn_A.
I1221 22:09:50.869076 140130015409088 templates.py:267] Found an exact template match 1ymm_D.
I1221 22:09:52.464181 140130015409088 templates.py:267] Found an exact template match 2wbj_C.
I1221 22:09:52.477079 140130015409088 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 1 sequences.
I1221 22:09:52.477609 140130015409088 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 20.
I1221 22:09:52.507951 140130015409088 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain B: TCRb
I1221 22:09:52.508221 140130015409088 pipeline_custom_templates.py:251] input_sequence is NAGVTQTPKFQVLKTGQSMTLQCSQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSIRGSRGEQFFGPGTRLTVL
I1221 22:09:52.508479 140130015409088 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpsr4r9_pt/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /tmp/tmp6eopczll.fasta /opt/tcrmodel2/data/databases/uniref90.tcrmhc.fasta"
I1221 22:09:52.536574 140130015409088 utils.py:36] Started Jackhmmer (uniref90.tcrmhc.fasta) query
I1221 22:09:55.291531 140130015409088 utils.py:40] Finished Jackhmmer (uniref90.tcrmhc.fasta) query in 2.755 seconds
I1221 22:09:56.456643 140130015409088 pipeline_custom_templates.py:226] Uniref90 MSA size: 10000 sequences. This is for templates, not MSA construction)
I1221 22:09:57.573189 140130015409088 hmmbuild.py:121] Launching subprocess ['/opt/conda/bin/hmmbuild', '--hand', '--amino', '/tmp/tmphtl8p9cu/output.hmm', '/tmp/tmphtl8p9cu/query.msa']
I1221 22:09:57.609564 140130015409088 utils.py:36] Started hmmbuild query
I1221 22:09:57.804909 140130015409088 hmmbuild.py:128] hmmbuild stdout:
# hmmbuild :: profile HMM construction from multiple sequence alignments
# HMMER 3.3.2 (Nov 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# input alignment file:             /tmp/tmphtl8p9cu/query.msa
# output HMM file:                  /tmp/tmphtl8p9cu/output.hmm
# input alignment is asserted as:   protein
# model architecture construction:  hand-specified by RF annotation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# idx name                  nseq  alen  mlen eff_nseq re/pos description
#---- -------------------- ----- ----- ----- -------- ------ -----------
1     query                    1   522   115     0.39  0.588 

# CPU time: 0.18u 0.00s 00:00:00.18 Elapsed: 00:00:00.19

stderr:

I1221 22:09:57.806952 140130015409088 utils.py:40] Finished hmmbuild query in 0.197 seconds
I1221 22:09:57.810156 140130015409088 hmmsearch.py:103] Launching sub-process ['/opt/conda/bin/hmmsearch', '--noali', '--cpu', '8', '--F1', '0.1', '--F2', '0.1', '--F3', '0.1', '--incE', '100', '-E', '100', '--domE', '100', '--incdomE', '100', '-A', '/tmp/tmpajh0p03r/output.sto', '/tmp/tmpajh0p03r/query.hmm', '/opt/tcrmodel2/data/databases/pdb_seqres.txt']
I1221 22:09:57.839715 140130015409088 utils.py:36] Started hmmsearch (pdb_seqres.txt) query
I1221 22:10:07.410263 140130015409088 utils.py:40] Finished hmmsearch (pdb_seqres.txt) query in 9.569 seconds
I1221 22:10:09.135686 140130015409088 templates.py:940] Searching for template for: NAGVTQTPKFQVLKTGQSMTLQCSQDMNHEYMSWYRQDPGMGLRLIHYSVGAGITDQGEVPNGYNVSRSTTEDFPLRLLSAAPSQTSVYFCASSYSIRGSRGEQFFGPGTRLTVL
I1221 22:10:09.523768 140130015409088 templates.py:267] Found an exact template match 3h9s_E.
I1221 22:10:09.992436 140130015409088 templates.py:267] Found an exact template match 3pwp_E.
I1221 22:10:10.708043 140130015409088 templates.py:267] Found an exact template match 3qfj_E.
I1221 22:10:12.579891 140130015409088 templates.py:267] Found an exact template match 3qh3_B.
I1221 22:10:12.591159 140130015409088 templates.py:267] Found an exact template match 3qh3_D.
I1221 22:10:12.923202 140130015409088 templates.py:267] Found an exact template match 1ao7_E.
I1221 22:10:13.329638 140130015409088 templates.py:267] Found an exact template match 2gj6_E.
I1221 22:10:13.688128 140130015409088 templates.py:267] Found an exact template match 3d39_E.
I1221 22:10:14.114186 140130015409088 templates.py:267] Found an exact template match 3d3v_E.
I1221 22:10:14.540390 140130015409088 templates.py:267] Found an exact template match 1bd2_E.
I1221 22:10:15.166651 140130015409088 templates.py:267] Found an exact template match 4ftv_E.
I1221 22:10:15.631337 140130015409088 templates.py:267] Found an exact template match 4grm_B.
I1221 22:10:15.642702 140130015409088 templates.py:267] Found an exact template match 4grm_D.
I1221 22:10:16.135198 140130015409088 templates.py:267] Found an exact template match 1qrn_E.
I1221 22:10:18.026664 140130015409088 templates.py:267] Found an exact template match 1qse_E.
I1221 22:10:18.602971 140130015409088 templates.py:267] Found an exact template match 1qsf_E.
I1221 22:10:18.832795 140130015409088 templates.py:267] Found an exact template match 3rev_B.
I1221 22:10:19.411654 140130015409088 templates.py:267] Found an exact template match 6jxr_n.
I1221 22:10:23.114786 140130015409088 templates.py:267] Found an exact template match 6rpb_E.
I1221 22:10:23.126906 140130015409088 templates.py:267] Found an exact template match 6rpb_J.
I1221 22:10:23.138053 140130015409088 templates.py:267] Found an exact template match 6rpb_O.
I1221 22:10:23.149887 140130015409088 templates.py:267] Found an exact template match 6rpb_T.
I1221 22:10:23.638771 140130015409088 templates.py:267] Found an exact template match 2bnq_E.
I1221 22:10:24.088422 140130015409088 templates.py:267] Found an exact template match 2bnr_E.
I1221 22:10:24.305546 140130015409088 templates.py:267] Found an exact template match 2bnu_B.
I1221 22:10:25.151929 140130015409088 templates.py:267] Found an exact template match 2f54_E.
I1221 22:10:25.165828 140130015409088 templates.py:267] Found an exact template match 2f54_L.
I1221 22:10:25.739937 140130015409088 templates.py:267] Found an exact template match 6q3s_E.
I1221 22:10:26.252640 140130015409088 templates.py:267] Found an exact template match 2f53_E.
I1221 22:10:28.041918 140130015409088 templates.py:267] Found an exact template match 4g9f_E.
I1221 22:10:28.657307 140130015409088 templates.py:267] Found an exact template match 5men_E.
I1221 22:10:29.838440 140130015409088 templates.py:267] Found an exact template match 6uz1_E.
I1221 22:10:29.846818 140130015409088 templates.py:267] Found an exact template match 6uz1_J.
I1221 22:10:30.297336 140130015409088 templates.py:267] Found an exact template match 2pye_E.
I1221 22:10:30.537515 140130015409088 templates.py:267] Found an exact template match 2pyf_B.
I1221 22:10:31.015649 140130015409088 templates.py:267] Found an exact template match 4g8g_E.
I1221 22:10:31.631257 140130015409088 templates.py:267] Found an exact template match 4wwk_B.
I1221 22:10:33.103120 140130015409088 templates.py:267] Found an exact template match 2p5w_E.
I1221 22:10:33.559883 140130015409088 templates.py:267] Found an exact template match 3gsn_B.
I1221 22:10:34.635650 140130015409088 templates.py:267] Found an exact template match 5e9d_E.
I1221 22:10:34.641914 140130015409088 templates.py:267] Found an exact template match 5e9d_J.
I1221 22:10:35.101376 140130015409088 templates.py:267] Found an exact template match 2p5e_E.
I1221 22:10:35.732964 140130015409088 templates.py:267] Found an exact template match 4mnq_E.
I1221 22:10:35.752513 140130015409088 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 1 sequences.
I1221 22:10:35.752756 140130015409088 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 20.
I1221 22:10:35.860662 140130015409088 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain C: pMHC
I1221 22:10:35.861105 140130015409088 pipeline_custom_templates.py:251] input_sequence is RLPAKAPLLSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDNDAASPRMVPRAPWMEQEGSEYWDRETRSARDTAQIFRVNLRTLRGYYNQSEAGSHTLQWMHGCELGPDGRFLRGYEQFAYDGKDYLTLNEDLRSWTAVDTAAQISEQKSNDASEAEHQRAYLEDTCVE
I1221 22:10:35.943218 140130015409088 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 1 sequences.
I1221 22:10:35.943490 140130015409088 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 4.
I1221 22:10:36.029374 140130015409088 run_alphafold_tcrmodel2.3.py:340] Running model model_1_multimer_v3_pred_0 on test_clsI_6kzw_pmhc_oc
I1221 22:10:36.030871 140130015409088 model.py:180] Running predict with shape(feat) = {'aatype': (401,), 'residue_index': (401,), 'seq_length': (), 'msa': (4096, 401), 'num_alignments': (), 'template_aatype': (4, 401), 'template_all_atom_mask': (4, 401, 37), 'template_all_atom_positions': (4, 401, 37, 3), 'asym_id': (401,), 'sym_id': (401,), 'entity_id': (401,), 'deletion_matrix': (4096, 401), 'deletion_mean': (401,), 'all_atom_mask': (401, 37), 'all_atom_positions': (401, 37, 3), 'assembly_num_chains': (), 'entity_mask': (401,), 'num_templates': (), 'cluster_bias_mask': (4096,), 'bert_mask': (4096, 401), 'seq_mask': (401,), 'msa_mask': (4096, 401)}
2023-12-21 22:14:23.367186: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:65] 
********************************
[Compiling module jit_apply_fn] Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
********************************
2023-12-21 22:14:56.349872: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:133] The operation took 2m32.983364202s

********************************
[Compiling module jit_apply_fn] Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
********************************
I1222 01:51:54.638633 140130015409088 model.py:190] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (401, 401, 64)}, 'experimentally_resolved': {'logits': (401, 37)}, 'masked_msa': {'logits': (508, 401, 22)}, 'num_recycles': (), 'predicted_aligned_error': (401, 401), 'predicted_lddt': {'logits': (401, 50)}, 'structure_module': {'final_atom_mask': (401, 37), 'final_atom_positions': (401, 37, 3)}, 'plddt': (401,), 'aligned_confidence_probs': (401, 401, 64), 'max_predicted_aligned_error': (), 'ptm': (), 'custom_iptm': [()], 'iptm': (), 'ranking_confidence': ()}
I1222 01:51:54.639772 140130015409088 run_alphafold_tcrmodel2.3.py:358] Total JAX model model_1_multimer_v3_pred_0 on test_clsI_6kzw_pmhc_oc predict time (includes compilation time, see --benchmark): 13278.6s
I1222 01:51:55.053872 140130015409088 run_alphafold_tcrmodel2.3.py:340] Running model model_2_multimer_v3_pred_0 on test_clsI_6kzw_pmhc_oc
I1222 01:51:55.054405 140130015409088 model.py:180] Running predict with shape(feat) = {'aatype': (401,), 'residue_index': (401,), 'seq_length': (), 'msa': (4096, 401), 'num_alignments': (), 'template_aatype': (4, 401), 'template_all_atom_mask': (4, 401, 37), 'template_all_atom_positions': (4, 401, 37, 3), 'asym_id': (401,), 'sym_id': (401,), 'entity_id': (401,), 'deletion_matrix': (4096, 401), 'deletion_mean': (401,), 'all_atom_mask': (401, 37), 'all_atom_positions': (401, 37, 3), 'assembly_num_chains': (), 'entity_mask': (401,), 'num_templates': (), 'cluster_bias_mask': (4096,), 'bert_mask': (4096, 401), 'seq_mask': (401,), 'msa_mask': (4096, 401)}
2023-12-22 01:56:10.049535: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:133] The operation took 2m29.56439045s

********************************
[Compiling module jit_apply_fn] Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
********************************
I1222 05:30:27.610606 140130015409088 model.py:190] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (401, 401, 64)}, 'experimentally_resolved': {'logits': (401, 37)}, 'masked_msa': {'logits': (508, 401, 22)}, 'num_recycles': (), 'predicted_aligned_error': (401, 401), 'predicted_lddt': {'logits': (401, 50)}, 'structure_module': {'final_atom_mask': (401, 37), 'final_atom_positions': (401, 37, 3)}, 'plddt': (401,), 'aligned_confidence_probs': (401, 401, 64), 'max_predicted_aligned_error': (), 'ptm': (), 'custom_iptm': [()], 'iptm': (), 'ranking_confidence': ()}
I1222 05:30:27.612574 140130015409088 run_alphafold_tcrmodel2.3.py:358] Total JAX model model_2_multimer_v3_pred_0 on test_clsI_6kzw_pmhc_oc predict time (includes compilation time, see --benchmark): 13112.6s
I1222 05:30:28.042386 140130015409088 run_alphafold_tcrmodel2.3.py:340] Running model model_3_multimer_v3_pred_0 on test_clsI_6kzw_pmhc_oc
I1222 05:30:28.043053 140130015409088 model.py:180] Running predict with shape(feat) = {'aatype': (401,), 'residue_index': (401,), 'seq_length': (), 'msa': (4096, 401), 'num_alignments': (), 'template_aatype': (4, 401), 'template_all_atom_mask': (4, 401, 37), 'template_all_atom_positions': (4, 401, 37, 3), 'asym_id': (401,), 'sym_id': (401,), 'entity_id': (401,), 'deletion_matrix': (4096, 401), 'deletion_mean': (401,), 'all_atom_mask': (401, 37), 'all_atom_positions': (401, 37, 3), 'assembly_num_chains': (), 'entity_mask': (401,), 'num_templates': (), 'cluster_bias_mask': (4096,), 'bert_mask': (4096, 401), 'seq_mask': (401,), 'msa_mask': (4096, 401)}
2023-12-22 05:34:41.531971: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:133] The operation took 2m27.164303535s

********************************
[Compiling module jit_apply_fn] Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
********************************
INFO:    Cleaning up image...
Hangup

Request for Assistance:

rui-yin commented 9 months ago

Hi @hotwa ,

Thanks for reaching out. That's too bad that you are encountering exceptionally long runtime and unexpected hangup. It looks like your Singularity container wasn't able to find the GPU. To use GPU correctly in Singularity, the CUDA version inside your container should be compatible with the NVIDIA driver version installed on your host system. If you don't mind, could you please check the CUDA version in your driver kernel? The nvidia-smi command will give this information. This way, we can better assist you set up the environment correctly.

Best, Rui

rui-yin commented 4 months ago

It appears that we haven't got any updates on this issue in the past few months. We will proceed to close the issue. As mentioned in my previous message, proper GPU configuration would be key to optimize the compile time. Please feel free to reopen the issue at any time.