kalininalab / alphafold_non_docker

AlphaFold2 non-docker setup
338 stars 119 forks source link

Taking very long time for protein structure prediction #40

Closed adi1bioinfo closed 1 year ago

adi1bioinfo commented 2 years ago

I have installed alphafold2 using the non-docker method on HPC, I am running the script using GPU (V100 with 16 GB of memory). For a sequence of around 200 amino acids, It is taking around 8 hours for structure determination. In Alphafold2 paper (https://www.nature.com/articles/s41586-021-03819-2.pdf), it is quoted that "Representative timings for the neural network using a single model on V100 GPU are 4.8 min with 256 residues, 9.2 min with 384 residues and 18 h at 2,500 residues". I have pasted the outputed logs below. Any idea why it is running very slow in my case although I am using the same GPU?

I0412 09:13:32.372322 140040161777472 tpu_client.py:54] Starting the local TPU driver.
I0412 09:13:32.539564 140040161777472 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local://
I0412 09:13:32.890057 140040161777472 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available.
I0412 09:13:38.549945 140040161777472 run_alphafold.py:376] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0412 09:13:38.550210 140040161777472 run_alphafold.py:393] Using random seed 1060063058774185674 for the data pipeline
I0412 09:13:38.550483 140040161777472 run_alphafold.py:161] Predicting A0A016TJD3
I0412 09:13:38.566384 140040161777472 jackhmmer.py:133] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpgcj2ht0b/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /scratch/laddhadi/fasta_files/A0A016TJD3.txt /scratch/laddhadi/alphafold_data//uniref90/uniref90.fasta"
I0412 09:13:38.641884 140040161777472 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0412 09:20:02.565351 140040161777472 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 383.923 seconds
I0412 09:20:02.754225 140040161777472 jackhmmer.py:133] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpzagxg5ib/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /scratch/laddhadi/fasta_files/A0A016TJD3.txt /scratch/laddhadi/alphafold_data//mgnify/mgy_clusters_2018_12.fa"
I0412 09:20:02.801197 140040161777472 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0412 09:27:44.650721 140040161777472 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 461.849 seconds
I0412 09:27:46.027261 140040161777472 hhsearch.py:85] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmp7u0wxikv/query.a3m -o /tmp/tmp7u0wxikv/output.hhr -maxseq 1000000 -d /scratch/laddhadi/alphafold_data//pdb70/pdb70"
I0412 09:27:46.121132 140040161777472 utils.py:36] Started HHsearch query
I0412 09:34:32.186318 140040161777472 utils.py:40] Finished HHsearch query in 406.065 seconds
I0412 09:34:32.942785 140040161777472 hhblits.py:128] Launching subprocess "/home/laddhadi/.conda/envs/alphafold/bin/hhblits -i /scratch/laddhadi/fasta_files/A0A016TJD3.txt -cpu 4 -oa3m /tmp/tmp_07sbvqa/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /scratch/laddhadi/alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /scratch/laddhadi/alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0412 09:34:33.030554 140040161777472 utils.py:36] Started HHblits query
I0409 18:56:30.223437 139865793787712 utils.py:40] Finished HHblits query in 16437.296 seconds
I0409 18:56:30.833186 139865793787712 templates.py:878] Searching for template for: MEAGGVADSLLSGACVLFTLGMFSSGLSDLRHMRMTRSVDNVQFLPFLTTDINNLSWLSYGALKGDGTLIIVNSVGAMLQTLYILVYLHYCPRKRGVLLQTAALLGVLLLGFGYFWLLVPDLEARLQWLGLFCSVFTISMYLSPLADLAKVIQTKSAQHFSFSLTIATLLASASWTLYGFRLKDPYITVPNFPGIVTSFIRLWLFWKYSQKPARNSQLLQT
I0409 18:56:30.946721 139865793787712 templates.py:267] Found an exact template match 5xpd_A.
I0409 18:56:31.633340 139865793787712 templates.py:267] Found an exact template match 5ctg_B.
I0409 18:56:31.643278 139865793787712 templates.py:267] Found an exact template match 5ctg_B.
I0409 18:56:31.652832 139865793787712 templates.py:267] Found an exact template match 5xpd_A.
I0409 18:56:31.664968 139865793787712 templates.py:267] Found an exact template match 5ctg_B.
I0409 18:56:31.674515 139865793787712 templates.py:267] Found an exact template match 5xpd_A.
I0409 18:56:32.157502 139865793787712 templates.py:267] Found an exact template match 4rng_D.
I0409 18:56:32.281505 139865793787712 templates.py:267] Found an exact template match 4x5m_B.
I0409 18:56:32.459235 139865793787712 templates.py:267] Found an exact template match 4qnd_A.
I0409 18:56:32.464134 139865793787712 templates.py:267] Found an exact template match 4qnd_A.
I0409 18:56:32.468961 139865793787712 templates.py:267] Found an exact template match 4rng_D.
I0409 18:56:32.473401 139865793787712 templates.py:267] Found an exact template match 4x5m_B.
I0409 18:56:32.609357 139865793787712 templates.py:267] Found an exact template match 5uhq_A.
I0409 18:56:32.688276 139865793787712 templates.py:267] Found an exact template match 4qnc_B.
I0409 18:56:32.692711 139865793787712 templates.py:267] Found an exact template match 5uhq_A.
I0409 18:56:32.697443 139865793787712 templates.py:267] Found an exact template match 4qnc_B.
I0409 18:56:32.701975 139865793787712 templates.py:267] Found an exact template match 5uhq_D.
I0409 18:56:32.706474 139865793787712 templates.py:267] Found an exact template match 5uhq_D.
I0409 18:56:32.710904 139865793787712 templates.py:718] hit 5j4i_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.04524886877828054.
I0409 18:56:32.710995 139865793787712 templates.py:912] Skipped invalid hit 5J4I_B Arginine/agmatine antiporter; AdiC, Transporter, Membrane Protein, Transport; 2.207A {Escherichia coli O157:H7}, error: None, warning: None
I0409 18:56:32.711062 139865793787712 templates.py:718] hit 3ob6_B did not pass prefilter: Proportion of residues aligned to query too small. Align ratio: 0.04072398190045249.
I0409 18:56:32.711109 139865793787712 templates.py:912] Skipped invalid hit 3OB6_B AdiC protein; Amino acid antiporter, Arginine, Membrane; HET: ARG; 3.0A {Escherichia coli}, error: None, warning: None
I0409 18:56:33.124143 139865793787712 pipeline.py:234] Uniref90 MSA size: 5963 sequences.
I0409 18:56:33.124355 139865793787712 pipeline.py:235] BFD MSA size: 1450 sequences.
I0409 18:56:33.124414 139865793787712 pipeline.py:236] MGnify MSA size: 135 sequences.
I0409 18:56:33.124469 139865793787712 pipeline.py:237] Final (deduplicated) MSA size: 7472 sequences.
I0409 18:56:33.124686 139865793787712 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 18.
I0409 18:56:33.153060 139865793787712 run_alphafold.py:190] Running model model_1_pred_0 on sweet_metazoan_F7D9S0
2022-04-09 18:56:37.475985: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-04-09 18:56:37.493497: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
I0409 18:56:43.118809 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'template_aatype': (4, 4, 221), 'template_all_atom_masks': (4, 4, 221, 37), 'template_all_atom_positions': (4, 4, 221, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 508, 221), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 221, 3), 'template_pseudo_beta_mask': (4, 4, 221), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 5120, 221), 'extra_msa_mask': (4, 5120, 221), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 221), 'true_msa': (4, 508, 221), 'extra_has_deletion': (4, 5120, 221), 'extra_deletion_value': (4, 5120, 221), 'msa_feat': (4, 508, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 19:00:32.892367: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55] 
********************************
Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn.149819
********************************
I0409 19:27:21.522277 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (508, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 19:27:21.522848 139865793787712 run_alphafold.py:202] Total JAX model model_1_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1838.4s
I0409 19:27:26.533854 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:27:28.535824 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 19:27:29.179506 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 19:27:49.272111 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:27:53.467640 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5198.28 Efinal -4264.63 Time 19.18 s num residue violations 0 num residue exclusions 0 
I0409 19:27:55.588373 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:27:56.617614 139865793787712 run_alphafold.py:190] Running model model_2_pred_0 on sweet_metazoan_F7D9S0
I0409 19:27:59.180104 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'template_aatype': (4, 4, 221), 'template_all_atom_masks': (4, 4, 221, 37), 'template_all_atom_positions': (4, 4, 221, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 508, 221), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 221, 3), 'template_pseudo_beta_mask': (4, 4, 221), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 1024, 221), 'extra_msa_mask': (4, 1024, 221), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 508, 221), 'true_msa': (4, 508, 221), 'extra_has_deletion': (4, 1024, 221), 'extra_deletion_value': (4, 1024, 221), 'msa_feat': (4, 508, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 19:31:45.701855: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55] 
********************************
Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn__1.149819
********************************
I0409 19:54:31.859675 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (508, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 19:54:31.860479 139865793787712 run_alphafold.py:202] Total JAX model model_2_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1592.7s
I0409 19:54:35.127964 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:54:35.358972 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 19:54:35.695668 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 19:54:50.386341 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:54:50.959881 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5700.67 Efinal -4254.93 Time 13.62 s num residue violations 0 num residue exclusions 0 
I0409 19:54:54.265672 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 19:55:01.368335 139865793787712 run_alphafold.py:190] Running model model_3_pred_0 on sweet_metazoan_F7D9S0
I0409 19:55:03.042567 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 512, 221), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 5120, 221), 'extra_msa_mask': (4, 5120, 221), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 221), 'true_msa': (4, 512, 221), 'extra_has_deletion': (4, 5120, 221), 'extra_deletion_value': (4, 5120, 221), 'msa_feat': (4, 512, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 19:58:10.149797: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55] 
********************************
Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn__2.110442
********************************
I0409 20:22:45.222151 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (512, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 20:22:45.222905 139865793787712 run_alphafold.py:202] Total JAX model model_3_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1662.2s
I0409 20:22:47.995997 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:22:48.224998 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 20:22:48.562712 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 20:23:07.021081 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:23:07.633584 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5711.85 Efinal -4302.03 Time 17.35 s num residue violations 0 num residue exclusions 0 
I0409 20:23:10.685475 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:23:11.315222 139865793787712 run_alphafold.py:190] Running model model_4_pred_0 on sweet_metazoan_F7D9S0
I0409 20:23:13.004297 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 512, 221), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 5120, 221), 'extra_msa_mask': (4, 5120, 221), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 512, 221), 'true_msa': (4, 512, 221), 'extra_has_deletion': (4, 5120, 221), 'extra_deletion_value': (4, 5120, 221), 'msa_feat': (4, 512, 221, 49), 'target_feat': (4, 221, 22)}
I0409 20:50:54.579662 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (512, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 20:50:54.580522 139865793787712 run_alphafold.py:202] Total JAX model model_4_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1661.6s
I0409 20:50:57.246246 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:50:57.478736 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 20:50:57.835792 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 20:51:16.007959 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:51:16.629822 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5922.10 Efinal -4334.91 Time 15.28 s num residue violations 0 num residue exclusions 0 
I0409 20:51:18.662594 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 20:51:19.123664 139865793787712 run_alphafold.py:190] Running model model_5_pred_0 on sweet_metazoan_F7D9S0
I0409 20:51:21.032859 139865793787712 model.py:165] Running predict with shape(feat) = {'aatype': (4, 221), 'residue_index': (4, 221), 'seq_length': (4,), 'is_distillation': (4,), 'seq_mask': (4, 221), 'msa_mask': (4, 512, 221), 'msa_row_mask': (4, 512), 'random_crop_to_size_seed': (4, 2), 'atom14_atom_exists': (4, 221, 14), 'residx_atom14_to_atom37': (4, 221, 14), 'residx_atom37_to_atom14': (4, 221, 37), 'atom37_atom_exists': (4, 221, 37), 'extra_msa': (4, 1024, 221), 'extra_msa_mask': (4, 1024, 221), 'extra_msa_row_mask': (4, 1024), 'bert_mask': (4, 512, 221), 'true_msa': (4, 512, 221), 'extra_has_deletion': (4, 1024, 221), 'extra_deletion_value': (4, 1024, 221), 'msa_feat': (4, 512, 221, 49), 'target_feat': (4, 221, 22)}
2022-04-09 20:54:27.613558: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55] 
********************************
Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
Compiling module jit_apply_fn__4.110442
********************************
I0409 21:14:56.327582 139865793787712 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (221, 221, 64)}, 'experimentally_resolved': {'logits': (221, 37)}, 'masked_msa': {'logits': (512, 221, 23)}, 'predicted_lddt': {'logits': (221, 50)}, 'structure_module': {'final_atom_mask': (221, 37), 'final_atom_positions': (221, 37, 3)}, 'plddt': (221,), 'ranking_confidence': ()}
I0409 21:14:56.328574 139865793787712 run_alphafold.py:202] Total JAX model model_5_pred_0 on sweet_metazoan_F7D9S0 predict time (includes compilation time, see --benchmark): 1415.3s
I0409 21:14:58.848648 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 21:14:59.077749 139865793787712 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0409 21:14:59.415221 139865793787712 amber_minimize.py:68] Restraining 1742 / 3532 particles.
I0409 21:15:20.303997 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 21:15:20.877890 139865793787712 amber_minimize.py:497] Iteration completed: Einit 5946.69 Efinal -4265.18 Time 18.49 s num residue violations 0 num residue exclusions 0 
I0409 21:15:22.856115 139865793787712 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 220 (THR) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0409 21:15:23.404444 139865793787712 run_alphafold.py:271] Final timings for sweet_metazoan_F7D9S0: {'features': 17525.107759714127, 'process_features_model_1_pred_0': 9.964778423309326, 'predict_and_compile_model_1_pred_0': 1838.4048104286194, 'relax_model_1_pred_0': 34.420247316360474, 'process_features_model_2_pred_0': 2.56180739402771, 'predict_and_compile_model_2_pred_0': 1592.6808323860168, 'relax_model_2_pred_0': 22.32586359977722, 'process_features_model_3_pred_0': 1.6738569736480713, 'predict_and_compile_model_3_pred_0': 1662.180498123169, 'relax_model_3_pred_0': 25.30202007293701, 'process_features_model_4_pred_0': 1.6887319087982178, 'predict_and_compile_model_4_pred_0': 1661.5763659477234, 'relax_model_4_pred_0': 24.005033254623413, 'process_features_model_5_pred_0': 1.9084506034851074, 'predict_and_compile_model_5_pred_0': 1415.2962276935577, 'relax_model_5_pred_0': 26.46107578277588}

Components of out file generated by HPC:

SLURM_JOB_NAME = af_g_new1
SLURM_JOB_NODELIST = gpu001
SLURM_JOB_UID = 15585
SLURM_JOB_PARTITION = gpu
SLURM_TASK_PID = 33813
SLURM_CPUS_ON_NODE = 40
SLURM_NTASKS = 40
SLURM_TASK_PID = 33813

timings.json file output

{
    "features": 26155.667644023895,
    "process_features_model_1_pred_0": 9.124100923538208,
    "predict_and_compile_model_1_pred_0": 1900.6409442424774,
    "relax_model_1_pred_0": 35.85753798484802,
    "process_features_model_2_pred_0": 2.5820021629333496,
    "predict_and_compile_model_2_pred_0": 1627.4828066825867,
    "relax_model_2_pred_0": 25.265653133392334,
    "process_features_model_3_pred_0": 1.7214865684509277,
    "predict_and_compile_model_3_pred_0": 1700.9324979782104,
    "relax_model_3_pred_0": 35.66818594932556,
    "process_features_model_4_pred_0": 1.80977201461792,
    "predict_and_compile_model_4_pred_0": 1702.664762020111,
    "relax_model_4_pred_0": 25.467517852783203,
    "process_features_model_5_pred_0": 1.7546741962432861,
    "predict_and_compile_model_5_pred_0": 1457.488024711609,
    "relax_model_5_pred_0": 25.19716238975525
}

Thank you, Aditi

sanjaysrikakulam commented 2 years ago

Hi @adi1bioinfo

Sorry for the delayed response. I do not have SLURM set up to test this. I assume that since this is a shared cluster maybe the GPU is used by several others simultaneously, RAM and storage could also be limiting factors for slow performance. Try with CPU instead of GPU and you may have better performance. I would need more hardware related information to further comment on this.