kalininalab / alphafold_non_docker

AlphaFold2 non-docker setup
331 stars 119 forks source link

Couldn't get ptxas version string #26

Closed ihbxiongjie closed 1 year ago

ihbxiongjie commented 2 years ago

Hi,

I install AF_non_docker as this git site. I think every thing goes smoothly in installation. When run 'bash run_alphafold.sh', a "Couldn't get ptxas version string" occurred. Any way to fix this issue?

2021-11-20 11:04:09.946403: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 184848424 exceeds 10% of free system memory. 2021-11-20 11:04:10.060729: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 184848424 exceeds 10% of free system memory. 2021-11-20 11:04:10.171764: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 184848424 exceeds 10% of free system memory. I1120 11:04:10.312324 139757326907200 model.py:165] Running predict with shape(feat) = {'aatype': (4, 173), 'residue_index': (4, 173), 'seq_length': (4,), 'template_aatype': (4, 4, 173), 'template_all_atom_masks': (4, 4, 173, 37), 'template_all_atom_positions': (4, 4, 173, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 173), 'msa_mask': (4, 508, 173), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 173, 3), 'template_pseudo_beta_mask': (4, 4, 173), 'atom14_atom_exists': (4, 173, 14), 'residx_atom14_to_atom37': (4, 173, 14), 'residx_atom37_to_atom14': (4, 173, 37), 'atom37_atom_exists': (4, 173, 37), 'extra_msa': (4, 5120, 173), 'extra_msa_mask': (4, 5120, 173), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 173), 'true_msa': (4, 508, 173), 'extra_has_deletion': (4, 5120, 173), 'extra_deletion_value': (4, 5120, 173), 'msa_feat': (4, 508, 173, 49), 'target_feat': (4, 173, 22)} 2021-11-20 11:04:10.349723: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Couldn't invoke ptxas --version 2021-11-20 11:04:10.350581: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

Thread 0x00007f1bc9d33740 (most recent call first): File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 474 in backend_compile File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 863 in compile_or_get_cached File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 921 in from_xla_computation File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 892 in compile File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 759 in _xla_callable_uncached File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 439 in xla_primitive_callable File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 180 in cached File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 187 in wrapper File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 416 in apply_primitive File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 624 in process_primitive File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 272 in bind File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 408 in shift_right_logical File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 240 in threefry_seed File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 202 in seed_with_impl File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/random.py", line 122 in PRNGKey File "/mnt/mpathb/alphafold2/alphafold/alphafold/model/model.py", line 167 in predict File "/mnt/mpathb/alphafold2/alphafold/run_alphafold.py", line 193 in predict_structure File "/mnt/mpathb/alphafold2/alphafold/run_alphafold.py", line 403 in main File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main File "/mnt/mpathb/alphafold2/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312 in run File "/mnt/mpathb/alphafold2/alphafold/run_alphafold.py", line 427 in

ihbxiongjie commented 2 years ago

run nvidia-smi shows

Sat Nov 20 12:00:24 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-PCI... Off | 00000000:C8:00.0 Off | 0 | | N/A 80C P0 137W / 250W | 40534MiB / 40536MiB | 100% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 88406 C python 727MiB | +-----------------------------------------------------------------------------+

sanjaysrikakulam commented 2 years ago

Hi @ihbxiongjie

Please refer to this https://github.com/google/jax/discussions/6843 and see if this helps.

ihbxiongjie commented 2 years ago

To whom it may concern, I finally solved this problem through an additional run of "conda install -c conda-forge cudatoolkit-dev" after "Install alphafold dependencies" step.

Lsz-20 commented 2 years ago

To whom it may concern, I finally solved this problem through an additional run of "conda install -c conda-forge cudatoolkit-dev" after "Install alphafold dependencies" step.

I have the same question. Here is my versions: tensorflow=2.5.0 jax=0.2.25 jaxlib=0.1.69+cuda111 nvidia-smi:NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 10.2 Perhaps some advice~

sanjaysrikakulam commented 2 years ago

HI @Lsz-20

Did you try what @ihbxiongjie suggested? Maybe you need to update your CUDA drivers?

Lsz-20 commented 2 years ago

HI @Lsz-20

Did you try what @ihbxiongjie suggested? Maybe you need to update your CUDA drivers?

It seems that this problem has been solved, but I got another one /(ㄒoㄒ)/~~ Perhaps because my CUDA drivers? Here are the visions: NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_Sep_13_19:13:29_PDT_2021 Cuda compilation tools, release 11.5, V11.5.50 Build cuda_11.5.r11.5/compiler.30411180_0

image image

Thanks for your help

sanjaysrikakulam commented 2 years ago

Hi @Lsz-20

This thread might help https://github.com/google/jax/issues/5723

Lsz-20 commented 2 years ago

Hi @Lsz-20

This thread might help google/jax#5723

Thanks~ I'll try it