pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.56k stars 3.57k forks source link

How to fix `_version_cpu.so: undefined symbol` #3240

Closed dongchirua closed 2 years ago

dongchirua commented 2 years ago

❓ Questions & Help

Hi,

I am encountering an issue when running pytorch_geometric on a CPU machine. I have searched for this issue, there is a question related to mine but it was about a mismatched Cuda version. Hence, I carefully double-check and install the CPU version. Below are my dependencies

  1. pytorch 1.9.0 cpu_py38h4bbe6ce_2 conda-forge
  2. pytorch-cpu 1.9.0 cpu_py38h718b53a_2 conda-forge
  3. pytorch-lightning 1.4.8 pypi_0 pypi
  4. torch-geometric 2.0.1 pypi_0 pypi
  5. torch-scatter 2.0.8 pypi_0 pypi
  6. torch-sparse 0.6.12 pypi_0 pypi
  7. torchmetrics 0.5.1 pypi_0 pypi
  8. torchvision 0.9.1 py38h9e2e28c_1_cpu conda-forge
  9. torchvision-cpu 0.9.1 h718b53a_1 conda-forge

I followed instructions on the main page, which are Case #1

Case #2 pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cpu.html

Both cases will lead to the same error when I run from torch_geometric.loader import DataLoader

/opt/conda/envs/workspace/lib/python3.8/site-packages/torch_geometric/data/data.py in <module>
      1 from typing import (Optional, Dict, Any, Union, List, Iterable, Tuple,
      2                     NamedTuple, Callable)
----> 3 from torch_geometric.typing import OptTensor
      4 from torch_geometric.deprecation import deprecated
      5 

/opt/conda/envs/workspace/lib/python3.8/site-packages/torch_geometric/typing.py in <module>
      2 
      3 from torch import Tensor
----> 4 from torch_sparse import SparseTensor
      5 
      6 # Types for accessing data ####################################################

/opt/conda/envs/workspace/lib/python3.8/site-packages/torch_sparse/__init__.py in <module>
     13         '_relabel'
     14 ]:
---> 15     torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
     16         f'{library}_{suffix}', [osp.dirname(__file__)]).origin)
     17 

/opt/conda/envs/workspace/lib/python3.8/site-packages/torch/_ops.py in load_library(self, path)
    102             # static (global) initialization code in order to register custom
    103             # operators with the JIT.
--> 104             ctypes.CDLL(path)
    105         self.loaded_libraries.add(path)
    106 

/opt/conda/envs/workspace/lib/python3.8/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    379 
    380         if handle is None:
--> 381             self._handle = _dlopen(self._name, mode)
    382         else:
    383             self._handle = handle

OSError: /opt/conda/envs/workspace/lib/python3.8/site-packages/torch_sparse/_version_cpu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs

I was wondering if it related to my CPU, as I'm run it on AWS with AMD CPU, output lscpu is

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       48 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7571
Stepping:            2
CPU MHz:             2553.484
BogoMIPS:            4399.73
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save

If you want to reproduce my workspace Dockerfile: https://github.com/dongchirua/build_personal_workplace/blob/main/cpu/Dockerfile Envfile: https://github.com/dongchirua/build_personal_workplace/blob/main/cpu/conda_environment.yml

rusty1s commented 2 years ago

Our wheels are build via the officially provided PyTorch wheels. Currently, you are using the PyTorch binary provided from conda-forge (rather than the one provided from -c pytorch. Running

conda install pytorch pyg -c pytorch -c pyg -c conda-forge

should fix this.

dongchirua commented 2 years ago

Our wheels are build via the officially provided PyTorch wheels. Currently, you are using the PyTorch binary provided from conda-forge (rather than the one provided from -c pytorch. Running

conda install pytorch pyg -c pytorch -c pyg -c conda-forge

should fix this.

issue still remains, here is conda env export inside docker

name: workspace
channels:
  - pyg
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_llvm
  - anyio=2.2.0=py38h06a4308_1
  - argon2-cffi=20.1.0=py38h27cfd23_1
  - async_generator=1.10=pyhd3eb1b0_0
  - attrs=21.2.0=pyhd3eb1b0_0
  - babel=2.9.1=pyhd3eb1b0_0
  - backcall=0.2.0=pyhd3eb1b0_0
  - blas=1.0=mkl
  - bleach=4.0.0=pyhd3eb1b0_0
  - bottleneck=1.3.2=py38heb32a55_1
  - brotli=1.0.9=he6710b0_2
  - brotlipy=0.7.0=py38h27cfd23_1003
  - ca-certificates=2021.7.5=h06a4308_1
  - certifi=2021.5.30=py38h06a4308_0
  - cffi=1.14.6=py38h400218f_0
  - charset-normalizer=2.0.4=pyhd3eb1b0_0
  - cpuonly=1.0=0
  - cryptography=3.4.7=py38hd23ed53_0
  - cycler=0.10.0=py38_0
  - dbus=1.13.18=hb2f20db_0
  - debugpy=1.4.1=py38h295c915_0
  - decorator=5.0.9=pyhd3eb1b0_0
  - defusedxml=0.7.1=pyhd3eb1b0_0
  - entrypoints=0.3=py38_0
  - expat=2.4.1=h2531618_2
  - fontconfig=2.13.1=h6c09931_0
  - fonttools=4.25.0=pyhd3eb1b0_0
  - freetype=2.10.4=h5ab3b9f_0
  - future=0.18.2=py38_1
  - glib=2.69.1=h5202010_0
  - googledrivedownloader=0.4=pyhd3deb0d_1
  - gst-plugins-base=1.14.0=h8213a91_2
  - gstreamer=1.14.0=h28cd5cc_2
  - icu=58.2=he6710b0_3
  - idna=3.2=pyhd3eb1b0_0
  - importlib-metadata=4.8.1=py38h06a4308_0
  - importlib_metadata=4.8.1=hd3eb1b0_0
  - ipykernel=6.2.0=py38h06a4308_1
  - ipython=7.27.0=py38hb070fc8_0
  - ipython_genutils=0.2.0=pyhd3eb1b0_1
  - jedi=0.18.0=py38h06a4308_1
  - jinja2=3.0.1=pyhd3eb1b0_0
  - joblib=1.0.1=pyhd3eb1b0_0
  - jpeg=9d=h7f8727e_0
  - json5=0.9.6=pyhd3eb1b0_0
  - jsonschema=3.2.0=pyhd3eb1b0_2
  - jupyter_client=7.0.1=pyhd3eb1b0_0
  - jupyter_core=4.7.1=py38h06a4308_0
  - jupyter_server=1.4.1=py38h06a4308_0
  - jupyterlab=3.1.7=pyhd3eb1b0_0
  - jupyterlab_pygments=0.1.2=py_0
  - jupyterlab_server=2.8.1=pyhd3eb1b0_0
  - kiwisolver=1.3.1=py38h2531618_0
  - lcms2=2.12=h3be6417_0
  - ld_impl_linux-64=2.35.1=h7274673_9
  - libblas=3.9.0=8_mkl
  - libffi=3.3=he6710b0_2
  - libgcc-ng=11.2.0=h1d223b6_8
  - libgfortran-ng=7.5.0=ha8ba4b0_17
  - libgfortran4=7.5.0=ha8ba4b0_17
  - libpng=1.6.37=hbc83047_0
  - libsodium=1.0.18=h7b6447c_0
  - libstdcxx-ng=9.3.0=hd4cf53a_17
  - libtiff=4.2.0=h85742a9_0
  - libuuid=1.0.3=h1bed415_2
  - libuv=1.40.0=h7b6447c_0
  - libwebp-base=1.2.0=h27cfd23_0
  - libxcb=1.14=h7b6447c_0
  - libxml2=2.9.12=h03d6c58_0
  - llvm-openmp=12.0.1=h4bd325d_1
  - lz4-c=1.9.3=h295c915_1
  - markupsafe=2.0.1=py38h27cfd23_0
  - matplotlib=3.4.2=py38h06a4308_0
  - matplotlib-base=3.4.2=py38hab158f2_0
  - matplotlib-inline=0.1.2=pyhd3eb1b0_2
  - mistune=0.8.4=py38h7b6447c_1000
  - mkl=2020.4=h726a3e6_304
  - mkl-service=2.3.0=py38he904b0f_0
  - mkl_fft=1.3.0=py38h54f3939_0
  - mkl_random=1.1.1=py38h0573a6f_0
  - munkres=1.1.4=py_0
  - nbclassic=0.2.6=pyhd3eb1b0_0
  - nbclient=0.5.3=pyhd3eb1b0_0
  - nbconvert=6.1.0=py38h06a4308_0
  - nbformat=5.1.3=pyhd3eb1b0_0
  - ncurses=6.2=he6710b0_1
  - nest-asyncio=1.5.1=pyhd3eb1b0_0
  - ninja=1.10.2=hff7bd54_1
  - notebook=6.4.3=py38h06a4308_0
  - numexpr=2.7.3=py38hb2eb853_0
  - numpy=1.19.2=py38h54aff64_0
  - numpy-base=1.19.2=py38hfa32c7d_0
  - olefile=0.46=pyhd3eb1b0_0
  - openjpeg=2.4.0=h3ad879b_0
  - openssl=1.1.1l=h7f8727e_0
  - packaging=21.0=pyhd3eb1b0_0
  - pandas=1.3.2=py38h8c16a72_0
  - pandocfilters=1.4.3=py38h06a4308_1
  - parso=0.8.2=pyhd3eb1b0_0
  - pcre=8.45=h295c915_0
  - pexpect=4.8.0=pyhd3eb1b0_3
  - pickleshare=0.7.5=pyhd3eb1b0_1003
  - pillow=8.3.1=py38h2c7a002_0
  - pip=21.2.2=py38h06a4308_0
  - prometheus_client=0.11.0=pyhd3eb1b0_0
  - prompt-toolkit=3.0.17=pyhca03da5_0
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - pycparser=2.20=py_2
  - pyg=2.0.1=py38_torch_1.9.0_cpu
  - pygments=2.10.0=pyhd3eb1b0_0
  - pyopenssl=20.0.1=pyhd3eb1b0_1
  - pyparsing=2.4.7=pyhd3eb1b0_0
  - pyqt=5.9.2=py38h05f1152_4
  - pyrsistent=0.17.3=py38h7b6447c_0
  - pysocks=1.7.1=py38h06a4308_0
  - python=3.8.11=h12debd9_0_cpython
  - python-dateutil=2.8.2=pyhd3eb1b0_0
  - python-louvain=0.15=pyhd3eb1b0_0
  - python_abi=3.8=2_cp38
  - pytorch=1.9.1=py3.8_cpu_0
  - pytorch-cluster=1.5.9=py38_torch_1.9.0_cpu
  - pytorch-cpu=1.6.0=cpu_py38h3369884_1
  - pytorch-scatter=2.0.8=py38_torch_1.9.0_cpu
  - pytorch-sparse=0.6.12=py38_torch_1.9.0_cpu
  - pytorch-spline-conv=1.2.1=py38_torch_1.9.0_cpu
  - pytz=2021.1=pyhd3eb1b0_0
  - pyyaml=5.4.1=py38h27cfd23_1
  - pyzmq=22.2.1=py38h295c915_1
  - qt=5.9.7=h5867ecd_1
  - readline=8.1=h27cfd23_0
  - requests=2.26.0=pyhd3eb1b0_0
  - scikit-learn=0.24.2=py38ha9443f7_0
  - scipy=1.6.2=py38h91f5cce_0
  - send2trash=1.8.0=pyhd3eb1b0_1
  - setuptools=58.0.4=py38h06a4308_0
  - sip=4.19.13=py38he6710b0_0
  - six=1.16.0=pyhd3eb1b0_0
  - sniffio=1.2.0=py38h06a4308_1
  - sqlite=3.36.0=hc218d9a_0
  - swig=4.0.2=h2531618_3
  - terminado=0.9.4=py38h06a4308_0
  - testpath=0.5.0=pyhd3eb1b0_0
  - threadpoolctl=2.2.0=pyh0d69192_0
  - tk=8.6.10=hbc83047_0
  - torchaudio=0.9.1=py38
  - torchvision=0.10.0=py38h9e2e28c_0_cpu
  - tornado=6.1=py38h27cfd23_0
  - tqdm=4.62.2=pyhd3eb1b0_1
  - traitlets=5.0.5=pyhd3eb1b0_0
  - typing_extensions=3.10.0.2=pyh06a4308_0
  - urllib3=1.26.6=pyhd3eb1b0_1
  - wcwidth=0.2.5=pyhd3eb1b0_0
  - webencodings=0.5.1=py38_1
  - wheel=0.37.0=pyhd3eb1b0_1
  - xz=5.2.5=h7b6447c_0
  - yacs=0.1.6=py_0
  - yaml=0.2.5=h7b6447c_0
  - zeromq=4.3.4=h2531618_0
  - zipp=3.5.0=pyhd3eb1b0_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.4.9=haebb681_0
  - pip:
    - absl-py==0.14.0
    - aiohttp==3.7.4.post0
    - async-timeout==3.0.1
    - cachetools==4.2.2
    - chardet==4.0.0
    - crc32c==2.2.post0
    - fsspec==2021.9.0
    - google-auth==1.35.0
    - google-auth-oauthlib==0.4.6
    - grpcio==1.40.0
    - jupyter-tensorboard==0.2.0
    - lightning-bolts==0.4.0
    - markdown==3.3.4
    - multidict==5.1.0
    - networkx==2.6.3
    - oauthlib==3.1.1
    - protobuf==3.18.0
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pydeprecate==0.3.1
    - pytorch-lightning==1.4.8
    - requests-oauthlib==1.3.0
    - rsa==4.7.2
    - tensorboard==2.6.0
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.0
    - tensorboardx==2.4
    - torchmetrics==0.5.1
    - werkzeug==2.0.1
    - yarl==1.6.3
prefix: /opt/conda/envs/workspace
rusty1s commented 2 years ago

Can you try to remove the pytorch-cpu package (pointing to version 1.6.0)?

dongchirua commented 2 years ago

thanks, I really don't know why this package exists but I forced to use this env file then it works