Open jfbazan opened 1 year ago
Quick update: This error does not appear to happen when ColabFold1.5.2 is run in monomer mode (AF2 mode set to 'auto' = ptm for monomer), but only happens on multimer setting (any of the different flavors, v1, v2, or v3). Here's the error msg return for multimer-v2:
LinAlgError Traceback (most recent call last)
8 frames /usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag) 95 96 def _raise_linalgerror_svd_nonconvergence(err, flag): ---> 97 raise LinAlgError("SVD did not converge") 98 99 def _raise_linalgerror_lstsq(err, flag):
LinAlgError: SVD did not converge
Oddly enough, I also get a 'Nan' error when running DeepMind's AF Colab that is running AF2.3.1 in multimer mode. This time it crashed as it was running the AMBER relax (which I'd toggled on), and here's the error message below. Thx again for your expert help, FB
OpenMMException Traceback (most recent call last)
5 frames /opt/conda/lib/python3.8/site-packages/simtk/openmm/openmm.py in minimize(context, tolerance, maxIterations) 4108 the maximum number of iterations to perform. If this is 0, minimation is continued until the results converge without regard to how many iterations it takes. The default value is 0. 4109 """ -> 4110 return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations) 4111 __swig_destroy__ = _openmm.delete_LocalEnergyMinimizer 4112
OpenMMException: Particle coordinate is nan
The issue is that google colab upgraded to jax 0.4.4. I've now updated the notebook to downgrade to old version of jax recommended by deepmind in local installations.
Thx for the heads-up on the jax version clash! Running the ColabFold again in a quick test (after reboot of the Colab), it looks like another jax issue pops up in the very early stages of running, right after AF2 weights are downloaded:
KeyError Traceback (most recent call last) /content/colabfold/batch.py in run(queries, result_dir, num_models, is_complex, num_recycles, recycle_early_stop_tolerance, model_order, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, num_relax, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, num_seeds, recompile_padding, zip_results, prediction_callback, save_single_representations, save_pair_representations, save_all, save_recycles, use_dropout, use_gpu_relax, stop_at_score, dpi, max_seq, max_extra_seq, use_cluster_profile, feature_dict_callback, **kwargs) 1203 import jax.tools.colab_tpu -> 1204 jax.tools.colab_tpu.setup_tpu() 1205 logger.info('Running on TPU')
29 frames KeyError: 'COLAB_TPU_ADDR'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/OpenSSL/crypto.py in
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
Hi all, I don't know if there is any update on this issue, but AF predictions continue to throw this very same error every time I try to run a prediction. Thanks!
Can you try again, but with latest version of the notebook?
I just tried with the notebook that was latest modified 7 hours ago (Latest commit 26ac916 7 hours ago, Next try to pin tensorflow-cpu to 2.11.0) and the problem is still there
On 2023-02-28 13:56, Sergey O wrote:
Can you try again, but with latest version of the notebook?
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.***>
Links:
[1] https://github.com/sokrypton/ColabFold/issues/399#issuecomment-1448132781 [2] https://github.com/notifications/unsubscribe-auth/AWR5AKJR45YGBZ6ET474XRDWZXYYFANCNFSM6AAAAAAVJ5HHPU
Tried again this morning (Tue 28th), and sadly get a similar jax issue as last night's AF2.3.1. multimer run, immediately after it downloads the AF2 weights. Here's the error msg:
KeyError Traceback (most recent call last) /content/colabfold/batch.py in run(queries, result_dir, num_models, is_complex, num_recycles, recycle_early_stop_tolerance, model_order, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, num_relax, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, num_seeds, recompile_padding, zip_results, prediction_callback, save_single_representations, save_pair_representations, save_all, save_recycles, use_dropout, use_gpu_relax, stop_at_score, dpi, max_seq, max_extra_seq, use_cluster_profile, feature_dict_callback, **kwargs) 1203 import jax.tools.colab_tpu -> 1204 jax.tools.colab_tpu.setup_tpu() 1205 logger.info('Running on TPU')
29 frames KeyError: 'COLAB_TPU_ADDR'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/OpenSSL/crypto.py in
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
I just deployed a fix that should hopefully fix these issues. Please try again.
Just tried again, & without going into AF2 weights download stage, rapidly got the same error msg:
KeyError Traceback (most recent call last) /content/colabfold/batch.py in run(queries, result_dir, num_models, is_complex, num_recycles, recycle_early_stop_tolerance, model_order, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, num_relax, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, num_seeds, recompile_padding, zip_results, prediction_callback, save_single_representations, save_pair_representations, save_all, save_recycles, use_dropout, use_gpu_relax, stop_at_score, dpi, max_seq, max_extra_seq, use_cluster_profile, feature_dict_callback, **kwargs) 1203 import jax.tools.colab_tpu -> 1204 jax.tools.colab_tpu.setup_tpu() 1205 logger.info('Running on TPU')
29 frames KeyError: 'COLAB_TPU_ADDR'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/OpenSSL/crypto.py in
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
Did you refresh the notebook and session? Please make sure no runtime was already running and that you completely reloaded the notebook.
Latest multimer run was positive, fixes seem to be holding! Many thx, Milot & Sergey
running smooth so far, thanks!
On 2023-02-28 16:14, jfbazan wrote:
Latest multimer run was positive, fixes seem to be holding! Many thx, Milot & Sergey
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.***>
Links:
[1] https://github.com/sokrypton/ColabFold/issues/399#issuecomment-1448361999 [2] https://github.com/notifications/unsubscribe-auth/AWR5AKPTY7NRVLFCOTPHVFLWZYI5VANCNFSM6AAAAAAVJ5HHPU
Just this afternoon (Mon Feb 27), my ColabFold AF2.3.1 multimer runs suddenly started exhibiting odd "Nan" (which = Not-a-number) values for the pLDDT, pTM and ipTM metrics, and accordingly, the program crashes (see error msgs below) at the end of the Model1 run. I've repeated this with a number of different sequences, rebooted the Colab with no changes, etc. Thx in advance for your expert help and advice.
023-02-27 21:11:34,081 Setting max_seq=508, max_extra_seq=2048 2023-02-27 21:12:16,142 alphafold2_multimer_v3_model_1_seed_000 recycle=0 pLDDT=nan pTM=nan ipTM=nan 2023-02-27 21:12:22,278 alphafold2_multimer_v3_model_1_seed_000 recycle=1 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:28,461 alphafold2_multimer_v3_model_1_seed_000 recycle=2 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:34,688 alphafold2_multimer_v3_model_1_seed_000 recycle=3 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:40,963 alphafold2_multimer_v3_model_1_seed_000 recycle=4 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:47,300 alphafold2_multimer_v3_model_1_seed_000 recycle=5 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:53,677 alphafold2_multimer_v3_model_1_seed_000 recycle=6 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:00,076 alphafold2_multimer_v3_model_1_seed_000 recycle=7 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:06,486 alphafold2_multimer_v3_model_1_seed_000 recycle=8 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:12,943 alphafold2_multimer_v3_model_1_seed_000 recycle=9 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:19,443 alphafold2_multimer_v3_model_1_seed_000 recycle=10 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:25,990 alphafold2_multimer_v3_model_1_seed_000 recycle=11 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:32,512 alphafold2_multimer_v3_model_1_seed_000 recycle=12 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:32,514 alphafold2_multimer_v3_model_1_seed_000 took 113.3s (12 recycles)
LinAlgError Traceback (most recent call last) in
64
65 download_alphafold_params(model_type, Path("."))
---> 66 results = run(
67 queries=queries,
68 result_dir=result_dir,
8 frames /usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag) 95 96 def _raise_linalgerror_svd_nonconvergence(err, flag): ---> 97 raise LinAlgError("SVD did not converge") 98 99 def _raise_linalgerror_lstsq(err, flag):
LinAlgError: SVD did not converge