v-mikhaylov / tfold-release

TFold v1.0
Apache License 2.0
20 stars 4 forks source link

running issue #8

Open nguyenbinhchem opened 2 months ago

nguyenbinhchem commented 2 months ago

I installed Tfold and ran it. But I got the following issue. Can you let me know how to solve it? Thank you.

$ ./run_alphafold.sh 2024-07-16 10:13:16.461777: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

:219: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject I0716 10:13:17.528547 139808617641792 tfold_run_alphafold.py:151] processing 15 inputs... I0716 10:13:17.684368 139808617641792 xla_bridge.py:231] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: I0716 10:13:17.799987 139808617641792 xla_bridge.py:231] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. I0716 10:13:19.274816 139808617641792 tfold_run_alphafold.py:172] Have 1 models: ['model_1'] I0716 10:13:19.274942 139808617641792 tfold_run_alphafold.py:176] Using random seed 359071070709738379 for the data pipeline I0716 10:13:19.275121 139808617641792 tfold_run_alphafold.py:87] Predicting for id 0 Traceback (most recent call last): File "/storage/Binh/MHC-pep-binding-predcition/dataset/TFold/tfold-release/tfold_run_alphafold.py", line 196, in app.run(main) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/storage/Binh/MHC-pep-binding-predcition/dataset/TFold/tfold-release/tfold_run_alphafold.py", line 186, in main predict_structure(sequences=sequences,msas=msas,template_hits=template_hits,renumber_list=renumber_list, File "/storage/Binh/MHC-pep-binding-predcition/dataset/TFold/tfold-release/tfold_run_alphafold.py", line 92, in predict_structure feature_dict=data_pipeline.process(sequences,msas,template_hits) File "/storage/Binh/MHC-pep-binding-predcition/dataset/TFold/tfold-release/tfold_patch/tfold_pipeline.py", line 135, in process msa_features=make_msa_features(msas) File "/storage/Binh/MHC-pep-binding-predcition/dataset/TFold/tfold-release/tfold_patch/tfold_pipeline.py", line 106, in make_msa_features for sequence_index, sequence in enumerate(msa.sequences): AttributeError: 'tuple' object has no attribute 'sequences'
v-mikhaylov commented 2 months ago

Hi there! Which version of AlphaFold are you using? If it is >2.1.0, that's probably the issue. You can get v2.1.0 here: https://github.com/google-deepmind/alphafold/releases/tag/v2.1.0

nguyenbinhchem commented 2 months ago

Thank you, Victor, for your prompt reply. I reinstalled Alphafold 2.1, but I got the following error

$/run_alphafold.sh 2024-07-16 13:32:30.086235: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

:219: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject I0716 13:32:31.230157 139855532943168 tfold_run_alphafold.py:151] processing 15 inputs... I0716 13:32:31.399425 139855532943168 xla_bridge.py:231] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: I0716 13:32:31.510840 139855532943168 xla_bridge.py:231] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. I0716 13:32:33.003260 139855532943168 tfold_run_alphafold.py:172] Have 1 models: ['model_1'] I0716 13:32:33.003416 139855532943168 tfold_run_alphafold.py:176] Using random seed 7363720492194540564 for the data pipeline I0716 13:32:33.003640 139855532943168 tfold_run_alphafold.py:87] Predicting for id 0 I0716 13:32:33.452289 139855532943168 templates.py:878] Searching for template for: NYNYLYRLFGSHSMRYFSTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDEETGKVKAHSQTDRENLRIALRYYNQSEAGSHTLQMMFGCDVGSDGRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQITKRKWEAAHVAEQQRAYLEGTCVDGLRRYLENGKETLQRT I0716 13:32:33.498608 139855532943168 templates.py:267] Found an exact template match 3wlb_0_A. I0716 13:32:33.552241 139855532943168 templates.py:267] Found an exact template match 4f7t_0_A. I0716 13:32:33.738360 139855532943168 templates.py:267] Found an exact template match 3vxs_0_A. I0716 13:32:33.801160 139855532943168 templates.py:267] Found an exact template match 5wwi_0_A. I0716 13:32:33.808267 139855532943168 tfold_pipeline.py:142] Final (deduplicated) MSA size: 8163 sequences. I0716 13:32:33.808494 139855532943168 tfold_pipeline.py:143] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 4. I0716 13:32:33.812415 139855532943168 tfold_run_alphafold.py:97] Running model model_1 on 0 2024-07-16 13:32:35.618007: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1 2024-07-16 13:32:35.618395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:65:00.0 name: Quadro P5000 computeCapability: 6.1 coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 15.88GiB deviceMemoryBandwidth: 269.00GiB/s 2024-07-16 13:32:35.618431: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2024-07-16 13:32:35.620085: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2024-07-16 13:32:35.620174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2024-07-16 13:32:35.620910: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10 2024-07-16 13:32:35.621151: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10 2024-07-16 13:32:35.655092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11 2024-07-16 13:32:35.655567: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11 2024-07-16 13:32:35.655713: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2024-07-16 13:32:35.656454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2024-07-16 13:32:35.702626: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-07-16 13:32:35.704157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:65:00.0 name: Quadro P5000 computeCapability: 6.1 coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 15.88GiB deviceMemoryBandwidth: 269.00GiB/s 2024-07-16 13:32:35.704709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2024-07-16 13:32:35.705471: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2024-07-16 13:32:36.843249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2024-07-16 13:32:36.843285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2024-07-16 13:32:36.843294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2024-07-16 13:32:36.844382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1140 MB memory) -> physical GPU (device: 0, name: Quadro P5000, pci bus id: 0000:65:00.0, compute capability: 6.1) 2024-07-16 13:32:36.936072: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2299965000 Hz I0716 13:32:37.771018 139855532943168 model.py:165] Running predict with shape(feat) = {'aatype': (4, 191), 'residue_index': (4, 191), 'seq_length': (4,), 'template_aatype': (4, 4, 191), 'template_all_atom_masks': (4, 4, 191, 37), 'template_all_atom_positions': (4, 4, 191, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 191), 'msa_mask': (4, 508, 191), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 191, 3), 'template_pseudo_beta_mask': (4, 4, 191), 'atom14_atom_exists': (4, 191, 14), 'residx_atom14_to_atom37': (4, 191, 14), 'residx_atom37_to_atom14': (4, 191, 37), 'atom37_atom_exists': (4, 191, 37), 'extra_msa': (4, 5120, 191), 'extra_msa_mask': (4, 5120, 191), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 191), 'true_msa': (4, 508, 191), 'extra_has_deletion': (4, 5120, 191), 'extra_deletion_value': (4, 5120, 191), 'msa_feat': (4, 508, 191, 49), 'target_feat': (4, 191, 22)} Traceback (most recent call last): File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 196, in app.run(main) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 186, in main predict_structure(sequences=sequences,msas=msas,template_hits=template_hits,renumber_list=renumber_list, File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 103, in predict_structure prediction_result=model_runner.predict(processed_feature_dict,random_seed=model_random_seed) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/model.py", line 167, in predict result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/transform.py", line 125, in apply_fn out, state = f.apply(params, {}, *args, **kwargs) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/transform.py", line 313, in apply_fn out = f(*args, **kwargs) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/model.py", line 83, in _forward_fn return model( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped out = f(*args, **kwargs) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors return bound_method(*args, **kwargs) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 376, in __call__ _, prev = hk.while_loop( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/stateful.py", line 610, in while_loop val, state = jax.lax.while_loop(pure_cond_fun, pure_body_fun, init_val) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/stateful.py", line 605, in pure_body_fun val = body_fun(val) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 369, in get_prev(do_call(x[1], recycle_idx=x[0], File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 337, in do_call return impl( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped out = f(*args, **kwargs) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors return bound_method(*args, **kwargs) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 161, in __call__ representations = evoformer_module(batch0, is_training) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped out = f(*args, **kwargs) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors return bound_method(*args, **kwargs) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 1778, in __call__ template_pair_representation = TemplateEmbedding(c.template, gc)( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped out = f(*args, **kwargs) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors return bound_method(*args, **kwargs) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 2073, in __call__ template_pair_representation = mapping.sharded_map(map_fn, in_axes=0)( File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/mapping.py", line 182, in mapped_fn outputs, _ = hk.scan(scan_iteration, outputs, slice_starts) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/stateful.py", line 504, in scan (carry, state), ys = jax.lax.scan( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/stateful.py", line 487, in stateful_fun carry, out = f(carry, x) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/mapping.py", line 171, in scan_iteration new_outputs = compute_shard(outputs, i, shard_size) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/mapping.py", line 165, in compute_shard slice_out = apply_fun_to_slice(slice_start, slice_size) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/mapping.py", line 138, in apply_fun_to_slice return fun(*input_slice) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/stateful.py", line 567, in mapped_fun out, state = mapped_pure_fun(args, state) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/stateful.py", line 558, in pure_fun out = fun(*args) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 2071, in map_fn return template_embedder(query_embedding, batch, mask_2d, is_training) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 428, in wrapped out = f(*args, **kwargs) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/haiku/_src/module.py", line 279, in run_interceptors return bound_method(*args, **kwargs) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/modules.py", line 1977, in __call__ quaternion=quat_affine.rot_to_quat(rot, unstack_inputs=True), File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/quat_affine.py", line 113, in rot_to_quat _, qs = jnp.linalg.eigh(k) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/_src/numpy/linalg.py", line 313, in eigh v, w = lax_linalg.eigh(a, lower=lower, symmetrize_input=symmetrize_input) jax._src.source_info_util.JaxStackTraceBeforeTransformation: RuntimeError: cuSolver has not been initialized The preceding stack trace is the source of the JAX operation that, once transformed by JAX, triggered the following exception. -------------------- The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 196, in app.run(main) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 186, in main predict_structure(sequences=sequences,msas=msas,template_hits=template_hits,renumber_list=renumber_list, File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 103, in predict_structure prediction_result=model_runner.predict(processed_feature_dict,random_seed=model_random_seed) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/model.py", line 167, in predict result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/_src/api.py", line 416, in cache_miss out_flat = xla.xla_call( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/core.py", line 1632, in bind return call_bind(self, fun, *args, **params) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/core.py", line 1623, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/core.py", line 1635, in process return trace.process_call(self, fun, tracers, params) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/core.py", line 627, in process_call return primitive.impl(f, *tracers, **params) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 581, in _xla_call_impl compiled_fun = _xla_callable(fun, device, backend, name, donated_invars, File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/linear_util.py", line 263, in memoized_fun ans = call(fun, *args) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 653, in _xla_callable_uncached return lower_xla_callable(fun, device, backend, name, donated_invars, File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 723, in lower_xla_callable out_nodes = jaxpr_subcomp(ctx, jaxpr, xla_consts, *xla_args) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 471, in jaxpr_subcomp ans = rule(ctx, map(aval, eqn.invars), map(aval, eqn.outvars), File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/_src/lax/control_flow.py", line 360, in _while_loop_translation_rule new_z = xla.jaxpr_subcomp( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 471, in jaxpr_subcomp ans = rule(ctx, map(aval, eqn.invars), map(aval, eqn.outvars), File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 1217, in f_new return jaxpr_subcomp(ctx, jaxpr, _xla_consts(ctx.builder, consts), File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 471, in jaxpr_subcomp ans = rule(ctx, map(aval, eqn.invars), map(aval, eqn.outvars), File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/_src/lax/control_flow.py", line 360, in _while_loop_translation_rule new_z = xla.jaxpr_subcomp( File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 471, in jaxpr_subcomp ans = rule(ctx, map(aval, eqn.invars), map(aval, eqn.outvars), File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 1064, in _xla_call_translation_rule out_nodes = jaxpr_subcomp(sub_ctx, call_jaxpr, (), *args) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 471, in jaxpr_subcomp ans = rule(ctx, map(aval, eqn.invars), map(aval, eqn.outvars), File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 1138, in wrapped ans = f(ctx.builder, *args, **kw) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jax/_src/lax/linalg.py", line 504, in _eigh_cpu_gpu_translation_rule v, w, info = syevd_impl(c, operand, lower=lower) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jaxlib/cusolver.py", line 281, in syevd lwork, opaque = cusolver_kernels.build_syevj_descriptor( jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: cuSolver has not been initialized The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified. -------------------- The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 196, in app.run(main) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 186, in main predict_structure(sequences=sequences,msas=msas,template_hits=template_hits,renumber_list=renumber_list, File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/tfold-release/tfold_run_alphafold.py", line 103, in predict_structure prediction_result=model_runner.predict(processed_feature_dict,random_seed=model_random_seed) File "/storage/Binh/MHC-pep-binding-predcition/dataset/AF2.1/alphafold-2.1.0/alphafold/model/model.py", line 167, in predict result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) File "/home/tbnguyen/miniconda3/envs/tfold-env/lib/python3.8/site-packages/jaxlib/cusolver.py", line 281, in syevd lwork, opaque = cusolver_kernels.build_syevj_descriptor( RuntimeError: cuSolver has not been initialized
v-mikhaylov commented 1 month ago

This is an AlphaFold problem not related to the TFold wrapper. It could be caused by some combination of GPU, CUDA driver, and conda dependencies (e.g. cudatoolkit). Google suggests that "cuSolver has not been initialized" is caused by OOM (https://github.com/google/jax/issues/5259, https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html) and indeed your GPU gets connected as ~1Gb despite having 16Gb, dunno if that is the problem: 2024-07-16 13:32:36.844382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1140 MB memory) -> physical GPU (device: 0, name: Quadro P5000, pci bus id: 0000:65:00.0, compute capability: 6.1) Unfortunately, getting AlphaFold to run can be tricky and I might not be able to help with that.

nguyenbinhchem commented 1 month ago

Thank you very much for your clarification. I will try to get AlphaFold to run.