jintuzhang / gnncv

MIT License
3 stars 0 forks source link

gpu support ? #1

Open chlwjd1234 opened 1 week ago

chlwjd1234 commented 1 week ago

Thanks for the interesting research.

I wanna try this gnn-cv by myself. Did you calculated the gnn-cv with gpu? If only cpu is used, any comment on calculation cost of simulation examples tested in preprint would be helpful.

Thanks!

jintuzhang commented 6 days ago

Hi, the GPU evaluation of CVs is supported. E.g.: https://github.com/jintuzhang/gnncv/blob/3d0a788a2a215576a5a65b417dd9b82fb574e5a1/nacl/run_biased_gnn/6A_2layer_1c/1/plumed.inp#L29

chlwjd1234 commented 2 days ago

Thanks.

  1. Now I'm struggling with the torch dependency issues. Could you tell me the torch versions that are tested, especially for this gnncv & mlcolvar modified for graph ? I'm trying to use libtorch 1.13.1 + cuda 11.7.

  2. Also, when running plumed with gnncv, I got error which says:

1455 terminate called after throwing an instance of 'PLMD::Plumed::ExceptionError' 1456 what(): Couldn't find method: 'forward' on class: 'torch.mlcolvar.graph.core.nn.models._torch_mangle_1.GVPModel (of Python compilation unit at: 0x513265c0)' 1457 Exception raised from getMethod at ../aten/src/ATen/core/class_type.cpp:342 (most recent call first): 1458 frame #0: c10::Error::Error(c10::SourceLocation, std::cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b

But idk why it cannot find forward method when running. When I check it manually, I got

from mlcolvar.graph.core.nn.models import GVPModel model = GVPModel(1,6,[16,1]) print(hasattr(model, 'forward')) True

jintuzhang commented 2 days ago
  1. Tesed PyTorch versions including 2.0.1 and 2.2.2, and tested LibTorch versions including 2.0.1, 2.1.2, and 2.2.2. However, for LibTorch, versions 2.1.2 and 2.2.2 are suggested, as earlier versions may contain memory issues (within PyG).
  2. First, try setting up a simulation using the uploaded files (e.g., the alanine dipeptide) to see if this was an issue with your PLUMED installation. If the provided example runs, then there should be a bug in your training script.
  3. As a side note, when fitting CVs for new systems, you may want to use SchNet instead of GVP from the beginning, which is smoother and easier to train, generally.
chlwjd1234 commented 1 day ago

Thank you for the comments. I'm having same problem in SchNet or GVP model. Also the example NaCl model.ptc has same errors.

  1. I wonder if using independent libtorch causes problem. I mean, when installing pytorch via conda or pip, there would be libtorch which is linked to pytorch. And when compiling plumed, independent libtorch ,which is downloaded by wget, is used. Is it correct to them in this way?

  2. Could you share the the versions of lightning, torch_geometric, torch_scatter ? Also, I wonder if all of the pyg related packages are necessary rather than only torch_geometric and torch_scatter (pyg, torch_sparse, torch_cluster ... )

jintuzhang commented 1 day ago
  1. Using an "independent" LibTorch for PLUMED is correct. However, since there is an error even for the model file in the repo, there must be an error in your PLUMED part (input script, or compilation). Try to compile PLUMED with LibTorch 2.1.2, and see if this will resolve your problem.
2. My conda environment: ```bash # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_kmp_llvm conda-forge aiohttp 3.9.5 py311h459d7ec_0 conda-forge aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge ase 3.22.1 pyhd8ed1ab_1 conda-forge astunparse 1.6.3 pyhd8ed1ab_0 conda-forge attrs 23.2.0 pyh71513ae_0 conda-forge blas 1.0 mkl conda-forge blinker 1.8.2 pyhd8ed1ab_0 conda-forge blosc 1.21.6 hef167b5_0 conda-forge brotli 1.1.0 hd590300_1 conda-forge brotli-bin 1.1.0 hd590300_1 conda-forge brotli-python 1.1.0 py311hb755f60_1 conda-forge bzip2 1.0.8 h4bc722e_7 conda-forge c-ares 1.32.3 h4bc722e_0 conda-forge c-blosc2 2.15.1 hc57e6cf_0 conda-forge ca-certificates 2024.8.30 hbcca054_0 conda-forge certifi 2024.8.30 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py311hb3a22ac_0 conda-forge cftime 1.6.4 py311h18e1886_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge contourpy 1.2.1 py311h9547e67_0 conda-forge cuda-cudart 11.8.89 0 nvidia cuda-cupti 11.8.87 0 nvidia cuda-libraries 11.8.0 0 nvidia cuda-nvrtc 11.8.89 0 nvidia cuda-nvtx 11.8.86 0 nvidia cuda-runtime 11.8.0 0 nvidia cudatoolkit 11.8.0 h4ba93d1_13 conda-forge cycler 0.12.1 pyhd8ed1ab_0 conda-forge cython 3.0.11 py311h55d416d_3 conda-forge dill 0.3.9 pyhd8ed1ab_0 conda-forge e3nn 0.5.1 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.2 pyhd8ed1ab_0 conda-forge fftw 3.3.10 nompi_hf1063bd_110 conda-forge filelock 3.15.4 pyhd8ed1ab_0 conda-forge flask 3.0.3 pyhd8ed1ab_0 conda-forge fonttools 4.53.1 py311h61187de_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge frozenlist 1.4.1 py311h459d7ec_0 conda-forge fsspec 2024.6.1 pyhff2d567_0 conda-forge future 1.0.0 pyhd8ed1ab_0 conda-forge gawk 5.3.1 hcd3d067_0 conda-forge gmp 6.3.0 hac33072_2 conda-forge gmpy2 2.1.5 py311hc4f1f91_1 conda-forge greenlet 3.1.1 py311hfdbb021_0 conda-forge gsl 2.7 he838d99_0 conda-forge h2 4.1.0 pyhd8ed1ab_0 conda-forge h5py 3.12.1 pypi_0 pypi hdf4 4.2.15 h2a13503_7 conda-forge hdf5 1.14.3 nompi_hdf9ad27_105 conda-forge hpack 4.0.0 pyh9f0ad1d_0 conda-forge hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge icu 75.1 he02047a_0 conda-forge idna 3.7 pyhd8ed1ab_0 conda-forge importlib-metadata 8.2.0 pyha770c72_0 conda-forge importlib_metadata 8.2.0 hd8ed1ab_0 conda-forge iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge itsdangerous 2.2.0 pyhd8ed1ab_0 conda-forge jax 0.4.31 pyhd8ed1ab_0 conda-forge jaxlib 0.4.30 cpu_py311hb2c720c_0 conda-forge jinja2 3.1.4 pyhd8ed1ab_0 conda-forge joblib 1.4.2 pyhd8ed1ab_0 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.5 py311h9547e67_1 conda-forge krb5 1.21.3 h659f571_0 conda-forge lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.40 hf3520f5_7 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libabseil 20240116.2 cxx17_he02047a_1 conda-forge libaec 1.1.3 h59595ed_0 conda-forge libasprintf 0.22.5 he8f35ee_3 conda-forge libblas 3.9.0 16_linux64_mkl conda-forge libbrotlicommon 1.1.0 hd590300_1 conda-forge libbrotlidec 1.1.0 hd590300_1 conda-forge libbrotlienc 1.1.0 hd590300_1 conda-forge libcblas 3.9.0 16_linux64_mkl conda-forge libcublas 11.11.3.6 0 nvidia libcufft 10.9.0.58 0 nvidia libcufile 1.9.1.3 0 nvidia libcurand 10.3.5.147 0 nvidia libcurl 8.9.1 hdb1bdb2_0 conda-forge libcusolver 11.4.1.48 0 nvidia libcusparse 11.7.5.86 0 nvidia libdeflate 1.20 hd590300_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc 14.1.0 h77fa898_1 conda-forge libgcc-ng 14.1.0 h69a702a_1 conda-forge libgettextpo 0.22.5 he02047a_3 conda-forge libgfortran 3.0.0 1 conda-forge libgfortran-ng 14.1.0 h69a702a_0 conda-forge libgfortran5 14.1.0 hc5f4f2c_0 conda-forge libgrpc 1.62.2 h15f2491_0 conda-forge libhwloc 2.11.1 default_hecaa2ac_1000 conda-forge libiconv 1.17 hd590300_2 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge liblapack 3.9.0 16_linux64_mkl conda-forge libllvm14 14.0.6 hcd5def8_4 conda-forge libnetcdf 4.9.2 nompi_h135f659_114 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnpp 11.8.0.86 0 nvidia libnsl 2.0.1 hd590300_0 conda-forge libnvjpeg 11.9.0.86 0 nvidia libpng 1.6.43 h2797004_0 conda-forge libprotobuf 4.25.3 h08a7969_0 conda-forge libre2-11 2023.09.01 h5a48ba9_2 conda-forge libsqlite 3.46.0 hde9e2c9_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx 14.1.0 hc0a3c3a_1 conda-forge libstdcxx-ng 14.1.0 h4852527_1 conda-forge libtiff 4.6.0 h1dd3fc0_3 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.16 hd590300_0 conda-forge libxml2 2.12.7 he7c6b58_4 conda-forge libzip 1.10.1 h2629f0a_3 conda-forge libzlib 1.3.1 h4ab18f5_1 conda-forge lightning 2.3.3 pyhd8ed1ab_0 conda-forge lightning-utilities 0.11.6 pyhd8ed1ab_0 conda-forge llvm-openmp 15.0.7 h0cdce71_0 conda-forge llvmlite 0.43.0 py311hbde99c3_0 conda-forge looseversion 1.3.0 pyhd8ed1ab_0 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge lzo 2.10 hd590300_1001 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.5 py311h459d7ec_0 conda-forge matplotlib-base 3.9.1 py311hffb96ce_0 conda-forge matscipy 1.0.0 py311h320fe9a_0 conda-forge mctc-lib 0.3.2 h3b12eaf_0 conda-forge mdtraj 1.10.0 py311h3f233a9_0 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge mkl 2022.2.1 h84fe81f_16997 conda-forge ml_dtypes 0.4.0 py311h14de704_1 conda-forge mpc 1.3.1 hfe3b2da_0 conda-forge mpfr 4.2.1 h38ae2d0_2 conda-forge mpiplus v0.0.2 pyhd8ed1ab_0 conda-forge mpmath 1.3.0 pyhd8ed1ab_0 conda-forge multidict 6.0.5 py311h459d7ec_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge ncurses 6.5 h59595ed_0 conda-forge netcdf4 1.7.1 nompi_py311h25b3b55_101 conda-forge networkx 3.3 pyhd8ed1ab_1 conda-forge nose 1.3.7 py_1006 conda-forge numba 0.60.0 py311h4bc866e_0 conda-forge numexpr 2.8.7 mkl_py311hbaa3ca7_4 conda-forge numpy 1.26.4 py311h64a7726_0 conda-forge ocl-icd 2.3.2 hd590300_1 conda-forge ocl-icd-system 1.0.0 1 conda-forge openjpeg 2.5.2 h488ebb8_0 conda-forge openmm 8.1.2 py311he040c58_2 conda-forge openmm-plumed 2.0.1 py311h4168a3b_1 conda-forge openmmtools 0.23.1 pyhd8ed1ab_0 conda-forge openpathsampling 1.7.0 pyh707e725_0 conda-forge openssl 3.3.2 hb9d3cd8_0 conda-forge opt-einsum 3.3.0 hd8ed1ab_2 conda-forge opt_einsum 3.3.0 pyhc1e730c_2 conda-forge opt_einsum_fx 0.1.4 pyhd8ed1ab_0 conda-forge packaging 24.1 pyhd8ed1ab_0 conda-forge pandas 2.2.2 py311h14de704_1 conda-forge pdbfixer 1.9 pyh1a96a4e_0 conda-forge pillow 10.4.0 py311h82a398c_0 conda-forge pip 24.2 pyhd8ed1ab_0 conda-forge pluggy 1.5.0 pyhd8ed1ab_0 conda-forge plumed 2.11.0.dev0 pypi_0 pypi prettytable 3.12.0 pyhd8ed1ab_0 conda-forge psutil 6.0.0 py311h331c9d8_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge py-cpuinfo 9.0.0 pyhd8ed1ab_0 conda-forge pycparser 2.22 pyhd8ed1ab_0 conda-forge pyg 2.5.2 py311_torch_2.2.0_cu118 pyg pygments 2.18.0 pyhd8ed1ab_0 conda-forge pymbar 4.0.3 h38be061_1 conda-forge pymbar-core 4.0.3 py311h1f0f07a_1 conda-forge pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge pytables 3.9.2 py311ha8f287f_3 conda-forge pytest 8.3.3 pyhd8ed1ab_0 conda-forge python 3.11.0 he550d4f_1_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-hostlist 1.23.0 pypi_0 pypi python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge python_abi 3.11 4_cp311 conda-forge pytorch 2.2.2 py3.11_cuda11.8_cudnn8.7.0_0 pytorch pytorch-cuda 11.8 h7e8668a_5 pytorch pytorch-lightning 2.3.3 pyhd8ed1ab_0 conda-forge pytorch-mutex 1.0 cuda pytorch pytorch-scatter 2.1.2 py311_torch_2.2.0_cu118 pyg pytz 2024.1 pyhd8ed1ab_0 conda-forge pyyaml 6.0.1 py311h459d7ec_1 conda-forge qhull 2020.2 h434a139_5 conda-forge re2 2023.09.01 h7f4b329_2 conda-forge readline 8.2 h8228510_1 conda-forge requests 2.32.3 pyhd8ed1ab_0 conda-forge rich 13.7.1 pyhd8ed1ab_0 conda-forge scikit-learn 1.5.1 py311hd632256_0 conda-forge scipy 1.14.0 py311h517d4fd_1 conda-forge setuptools 71.0.4 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.2.1 ha2e4443_0 conda-forge sqlalchemy 2.0.35 py311h9ecbd09_0 conda-forge svgwrite 1.4.3 pyhd8ed1ab_0 conda-forge sympy 1.13.0 pypyh2585a3b_103 conda-forge tbb 2021.12.0 h434a139_3 conda-forge threadpoolctl 3.5.0 pyhc1e730c_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge toml 0.10.2 pypi_0 pypi toml-f 0.4.2 hd8f1df9_0 conda-forge tomli 2.0.2 pyhd8ed1ab_0 conda-forge torch-ema 0.3 pyhd8ed1ab_0 conda-forge torchmetrics 1.4.0.post0 pyhd8ed1ab_0 conda-forge torchtriton 2.2.0 py311 pytorch tqdm 4.66.4 pyhd8ed1ab_0 conda-forge typing-extensions 4.12.2 hd8ed1ab_0 conda-forge typing_extensions 4.12.2 pyha770c72_0 conda-forge tzdata 2024a h0c530f3_0 conda-forge ujson 5.10.0 py311hfdbb021_1 conda-forge urllib3 2.2.2 pyhd8ed1ab_1 conda-forge wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge werkzeug 3.0.3 pyhd8ed1ab_0 conda-forge wheel 0.43.0 pyhd8ed1ab_1 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge yarl 1.9.4 py311h459d7ec_0 conda-forge zipp 3.19.2 pyhd8ed1ab_0 conda-forge zlib 1.3.1 h4ab18f5_1 conda-forge zlib-ng 2.2.1 he02047a_0 conda-forge zstandard 0.23.0 py311h5cd10c7_0 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge ```
chlwjd1234 commented 23 hours ago

Thanks.
One quick question: Is your libtorch ABI compatible or not? I've been using abi compatible version of libtorch 2.1.2 for plumed, but I just noticed that pytorch is usually built without ABI when installed via conda

jintuzhang commented 14 hours ago

Both versions of LibTorch (abi and no-abi) have been tested. It would be fine as long as you could compile PLUMED.

chlwjd1234 commented 13 hours ago

I'm not sure, but for now I'm suspecting the namespace mangling when saving to torchscript may be causing the error. Still working on it..