nrbennet / dl_binder_design

MIT License
203 stars 49 forks source link

Segmentation fault (core dumped) for CUDA12 #60

Closed danboshuiyan closed 8 months ago

danboshuiyan commented 8 months ago

I have only installed the environment for af2_binder_design.yml, and encountered an issue when running predict.py. Segmentation fault (core dumped) The environment is as follows:NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 jax 0.4.23 jaxlib 0.4.23+cuda12.cudnn89 May I ask how I can solve this problem? Thank you very much

ecass777 commented 8 months ago

I am having the same problem as well. My environment passes the importtest and says it recongizes the GPU, but once I run predict.py I get a segmentation fault core dumped issue.

jxshi commented 8 months ago

I have the same issue. This is the GPU driver version information: NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 And here is the list of installed packages in the af2_binder_design conda environment:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   2.0.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.9.1           py311h459d7ec_0    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
biopython                 1.81                     pypi_0    pypi
blinker                   1.7.0              pyhd8ed1ab_0    conda-forge
brotli-python             1.1.0           py311hb755f60_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.24.0               hd590300_0    conda-forge
ca-certificates           2023.11.17           hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.2              pyhd8ed1ab_0    conda-forge
certifi                   2023.11.17         pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py311hb3a22ac_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
chex                      0.1.85                   pypi_0    pypi
click                     8.1.7           unix_pyh707e725_0    conda-forge
contextlib2               21.6.0             pyhd8ed1ab_0    conda-forge
cryptography              41.0.7          py311hcb13ee4_1    conda-forge
cuda-version              11.8                 h70ddcb2_2    conda-forge
cudatoolkit               11.8.0              h4ba93d1_12    conda-forge
cudnn                     8.8.0.121            hcdd5f01_4    conda-forge
dm-haiku                  0.0.11                   pypi_0    pypi
dm-tree                   0.1.8                    pypi_0    pypi
etils                     1.6.0                    pypi_0    pypi
flatbuffers               23.5.26              h59595ed_1    conda-forge
flax                      0.7.5                    pypi_0    pypi
frozenlist                1.4.1           py311h459d7ec_0    conda-forge
fsspec                    2023.12.2                pypi_0    pypi
gast                      0.5.4              pyhd8ed1ab_0    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
google-auth               2.26.0             pyhca7485f_0    conda-forge
google-auth-oauthlib      1.0.0              pyhd8ed1ab_1    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
grpcio                    1.54.3          py311hcafe171_0    conda-forge
h5py                      3.10.0          nompi_py311hebc2b07_101    conda-forge
hdf5                      1.14.3          nompi_h4f84152_100    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
importlib-metadata        7.0.1              pyha770c72_0    conda-forge
importlib-resources       6.1.1                    pypi_0    pypi
jax                       0.4.23                   pypi_0    pypi
jaxlib                    0.4.23+cuda12.cudnn89          pypi_0    pypi
jmp                       0.0.4                    pypi_0    pypi
keras                     2.13.1             pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libabseil                 20230125.3      cxx17_h59595ed_0    conda-forge
libaec                    1.1.2                h59595ed_1    conda-forge
libblas                   3.9.0           20_linux64_openblas    conda-forge
libcblas                  3.9.0           20_linux64_openblas    conda-forge
libcurl                   8.5.0                hca28451_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
libgrpc                   1.54.3               hb20ce57_0    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           20_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.12              hfc55251_2    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.5.1              pyhd8ed1ab_0    conda-forge
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.3           py311h459d7ec_1    conda-forge
mdurl                     0.1.2                    pypi_0    pypi
ml-collections            0.1.1              pyhd8ed1ab_0    conda-forge
ml_dtypes                 0.3.1           py311h320fe9a_2    conda-forge
mock                      5.1.0              pyhd8ed1ab_0    conda-forge
msgpack                   1.0.7                    pypi_0    pypi
multidict                 6.0.4           py311h459d7ec_1    conda-forge
nccl                      2.19.4.1             h6103f9b_0    conda-forge
ncurses                   6.4                  h59595ed_2    conda-forge
nest-asyncio              1.5.8                    pypi_0    pypi
numpy                     1.26.3          py311h64a7726_0    conda-forge
nvidia-cublas-cu12        12.3.4.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.3.101                 pypi_0    pypi
nvidia-cuda-nvcc-cu12     12.3.107                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.3.107                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.3.101                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.7.29                 pypi_0    pypi
nvidia-cufft-cu12         11.0.12.1                pypi_0    pypi
nvidia-cusolver-cu12      11.5.4.101               pypi_0    pypi
nvidia-cusparse-cu12      12.2.0.103               pypi_0    pypi
nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.3.101                 pypi_0    pypi
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openssl                   3.2.0                hd590300_1    conda-forge
opt_einsum                3.3.0              pyhc1e730c_2    conda-forge
optax                     0.1.7                    pypi_0    pypi
orbax-checkpoint          0.4.8                    pypi_0    pypi
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pip                       23.3.2             pyhd8ed1ab_0    conda-forge
protobuf                  4.21.12         py311hcafe171_0    conda-forge
pyasn1                    0.5.1              pyhd8ed1ab_0    conda-forge
pyasn1-modules            0.3.0              pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pygments                  2.17.2                   pypi_0    pypi
pyjwt                     2.8.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 23.3.0             pyhd8ed1ab_0    conda-forge
pyrosetta                 2023.49+release.9891f2c         py311_0    https://conda.graylab.jhu.edu
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.11.7          hab00c5b_1_cpython    conda-forge
python-flatbuffers        23.5.26            pyhd8ed1ab_0    conda-forge
python_abi                3.11                    4_cp311    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.1           py311h459d7ec_1    conda-forge
re2                       2023.03.02           h8c504da_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rich                      13.7.0                   pypi_0    pypi
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scipy                     1.11.4                   pypi_0    pypi
setuptools                69.0.3             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.13.0             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.7.0           py311h63ff55d_1    conda-forge
tensorflow                2.13.1          cuda118py311h878bca4_1    conda-forge
tensorflow-base           2.13.1          cuda118py311h002e3ce_1    conda-forge
tensorflow-estimator      2.13.1          cuda118py311h4a64c31_1    conda-forge
tensorstore               0.1.51                   pypi_0    pypi
termcolor                 2.3.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toolz                     0.12.0                   pypi_0    pypi
typing_extensions         4.5.0              pyha770c72_0    conda-forge
tzdata                    2023d                h0c530f3_0    conda-forge
urllib3                   2.1.0              pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.1              pyhd8ed1ab_0    conda-forge
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0          py311h459d7ec_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.9.3           py311h459d7ec_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Is there any solution now? Thanks!

Best, Jianxiang

nrbennet commented 8 months ago

I'm sorry to hear that you're having environment issues! I would recommend to inspect the core file that is being dumped, those will give you a hint as to where the error is coming from.

My suspicion is that the issue is actually coming from PyRosetta and not from PyTorch. The import tests are not actually testing whether PyRosetta is correctly installed, I will add this testing shortly.

nrbennet commented 8 months ago

I've added PyRosetta import testing to the tests. Please run this new one and see if the issue is with PyRosetta.

jxshi commented 8 months ago

I have tried with the new import testing file and it failed. Here is the error message: Segmentation fault (core dumped) However, when I ran the import testing in python separately, the two imports both went through.

#!/usr/bin/env python

# PyRosetta install test
print("/"*200)
print("Testing PyRosetta install. If this script errors before you see a PyRosetta success message then you " + \
      "have an issue with your PyRosetta install")
print("/"*200)

from pyrosetta import *
from pyrosetta.rosetta import *
init()

print("/"*70)
print("PyRosetta installation was successful!")
print("/"*70)

print("\n")

Maybe the core dump error was due to python package incompatibility?

Can you check further for us, please?

Best, Jianxiang

nrbennet commented 8 months ago

The version of JAX that conda was installing was incompatible with PyRosetta for some reason. I've added an explicit requirement for JAX to be a slightly older version and this fixes the issue.

jxshi commented 8 months ago

Thank you very much for the quick fix. I have pinned biopython to 1.81 to make it work.

Best, Jianxiang