pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.16k stars 3.64k forks source link

OSError: undefined symbol: _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS1_11ArgumentDefEEES4_ #4355

Open sleepymalc opened 2 years ago

sleepymalc commented 2 years ago

🐛 Describe the bug

I'm trying to reproduce the SEAL algorithm, and I install exactly the version indicated there. But I keep getting the following error:

$ python seal_link_pred.py --dataset ogbl-ppa --num_hops 1 --use_feature --use_edge_weight --eval_steps 5 --epochs 20 --dynamic_train --dynamic_val --dynamic_test --train_percent 5
Traceback (most recent call last):
  File "seal_link_pred.py", line 21, in <module>
    from torch_sparse import coalesce
  File "/home/pbb/anaconda3/envs/graph/lib/python3.8/site-packages/torch_sparse/__init__.py", line 19, in <module>
    torch.ops.load_library(spec.origin)
  File "/home/pbb/anaconda3/envs/graph/lib/python3.8/site-packages/torch/_ops.py", line 105, in load_library
    ctypes.CDLL(path)
  File "/home/pbb/anaconda3/envs/graph/lib/python3.8/ctypes/__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/pbb/anaconda3/envs/graph/lib/python3.8/site-packages/torch_sparse/_version_cpu.so: undefined symbol: _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS1_11ArgumentDefEEES4_

I believe that this error should somehow belong here since I found some relevant issues here as well. To reproduce the issue, I basically did the following.

  1. Create a clean environment via conda
    $ conda env create -n graph python==3.8.5
  2. Install Pytorch 1.6.0 with cu102
    $ pip install torch==1.6.0 torchvision==0.7.0
    image

The official document installing instruction is without cu102, and I followed it strictly.

  1. Install Pytorch Geometric 1.6.1 with cu102
    $ pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
    $ pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
    $ pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
    $ pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
    $ pip install torch-geometric==1.6.1
  2. Install ogb
    $ pip install ogb==1.2.4
  3. Run any algorithm in SEAL.

I imagine that this should be the correct way to install all the required packages. And other relevant issues say this may be caused by multiple versions of Pytorch or Pytorch_geometric being installed, but I honestly don't know how to check. All the packages I install should be listed via the command conda list, whose output is indicated in the environment section below.

Environment

rusty1s commented 2 years ago

Can you try to install via explicit versions (see here)?

$ pip install torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
$ pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
$ pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
$ pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
sleepymalc commented 2 years ago

I tried this, and get the following new errors.

$ python seal_link_pred.py --dataset ogbl-ppa --num_hops 1 --use_feature --use_edge_weight --eval_steps 5 --epochs 20 --dynamic_train --dynamic_val --dynamic_test --train_percent 5
Traceback (most recent call last):
  File "seal_link_pred.py", line 21, in <module>
    from torch_sparse import coalesce
  File "/home/pbb/anaconda3/envs/gnn/lib/python3.8/site-packages/torch_sparse/__init__.py", line 14, in <module>
    torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
  File "/home/pbb/anaconda3/envs/gnn/lib/python3.8/site-packages/torch/_ops.py", line 105, in load_library
    ctypes.CDLL(path)
  File "/home/pbb/anaconda3/envs/gnn/lib/python3.8/ctypes/__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /sw/arcts/centos7/cuda/11.0.2/lib64/libcusparse.so.10: version `libcusparse.so.10' not found (required by /home/pbb/anaconda3/envs/gnn/lib/python3.8/site-packages/torch_sparse/_spspmm_cuda.so)

It seems like still an installation issue?

rusty1s commented 2 years ago

It looks like PyTorch is picking up your local CUDA 11.0 version (which explains the issue due to version mismatch). You may need to explicitly link to your CUDA 10.2 version by setting LD_LIBRARY_PATH (it may be contained in miniconda3/lib or miniconda/envs/{env}/lib.

sleepymalc commented 2 years ago

Sorry, may I ask what exactly I am supposed to do? In anaconda3/envs/gnn/lib, I have the following.

image
rusty1s commented 2 years ago

Oh, I see. You never installed PyTorch via conda (just regular pip in which CUDA comes bundled within). These files should exist in case you install via conda (see here):

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

Let me know!

sanketsavla commented 2 years ago

For me, nothing works. Even after executing below commands, I still get same Error: $ pip install torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html $ pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html $ pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html $ pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html

And also same Error after executing: conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch

Here is the list of commands I ran sequentially: conda create --name collision python=3.6 conda activate collision pip install -r requirements.txt (See content of requirements.txt below) pip install torch_geometric conda install pyg -c pyg pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cpu.html pip install torch==1.8.0 conda install pyg -c pyg pip install torch==1.4.0 pip install torch==1.6.0 pip install torch-geometric==2.0.3

And then executed above recommended commands without much luck.

(collision) sankets@lambda-dual:~/anaconda3/envs/collision/lib$ ls -a . libffi.so libgomp.so liblzma.so libncurses.so.6.3 libpython3.6m.so libssl.so libtinfow.so.6.3 python3.6 .. libffi.so.6 libgomp.so.1 liblzma.so.5 libncursesw.a libpython3.6m.so.1 libssl.so.1.1 libtk8.6.so sqlite3.34.0 engines-1.1 libffi.so.7 libgomp.so.1.0.0 liblzma.so.5.2.5 libncursesw.so libpython3.6m.so.1.0 libstdc++.so libtkstub8.6.a tcl8 itcl4.2.1 libffi.so.7.1.0 libhistory.a libmenu.a libncursesw.so.6 libquadmath.so libstdc++.so.6 libtsan.so tcl8.6 libasan.so libform.a libhistory.so libmenu.so libncursesw.so.6.3 libquadmath.so.0 libstdc++.so.6.0.29 libtsan.so.0 tclConfig.sh libasan.so.6 libform.so libhistory.so.8 libmenu.so.6 libpanel.a libquadmath.so.0.0.0 libtcl8.6.so libtsan.so.0.0.0 tclooConfig.sh libasan.so.6.0.0 libform.so.6 libhistory.so.8.1 libmenu.so.6.3 libpanel.so libreadline.a libtclstub8.6.a libubsan.so tdbc1.1.2 libatomic.so libform.so.6.3 libitm.so libmenuw.a libpanel.so.6 libreadline.so libtinfo.a libubsan.so.1 tdbcmysql1.1.2 libatomic.so.1 libformw.a libitm.so.1 libmenuw.so libpanel.so.6.3 libreadline.so.8 libtinfo.so libubsan.so.1.0.0 tdbcodbc1.1.2 libatomic.so.1.2.0 libformw.so libitm.so.1.0.0 libmenuw.so.6 libpanelw.a libreadline.so.8.1 libtinfo.so.6 libz.a tdbcpostgres1.1.2 libcrypto.a libformw.so.6 liblsan.so libmenuw.so.6.3 libpanelw.so libsqlite3.so libtinfo.so.6.3 libz.so terminfo libcrypto.so libformw.so.6.3 liblsan.so.0 libncurses.a libpanelw.so.6 libsqlite3.so.0 libtinfow.a libz.so.1 thread2.8.6 libcrypto.so.1.1 libgcc_s.so liblsan.so.0.0.0 libncurses.so libpanelw.so.6.3 libsqlite3.so.0.8.6 libtinfow.so libz.so.1.2.12 tk8.6 libffi.a libgcc_s.so.1 liblzma.a libncurses.so.6 libpython3.6m.a libssl.a libtinfow.so.6 pkgconfig tkConfig.sh

Content of requirements.txt: absl-py==0.7.1 astor==0.7.1 backcall==0.1.0 cycler==0.10.0 decorator==4.4.0 gast==0.2.2 google-pasta==0.1.7 grpcio==1.20.1 h5py==2.9.0 ipython==7.8.0 ipython-genutils==0.2.0 jedi==0.15.1 kiwisolver==1.1.0 Markdown==3.1.1 matplotlib==3.1.1 networkx==2.4 numpy==1.16.5 opencv-python==4.1.1.26 pandas==0.23.4 parso==0.5.1 pexpect==4.7.0 pickleshare==0.7.5 Pillow==8.1.1 prompt-toolkit==2.0.10 protobuf==3.7.1 ptyprocess==0.6.0 Pygments==2.4.2 pyparsing==2.4.2 python-dateutil==2.8.0 pytz==2019.3 PyYAML==5.4 scikit-image==0.15.0 scikit-learn==0.21.3 scipy==1.1.0 six==1.12.0 termcolor==1.1.0 tqdm==4.36.1 pytorch-nlp==0.5.0 torch-geometric==1.5.0 traitlets==4.3.3 wcwidth==0.1.7 Werkzeug==0.16.0 wrapt==1.11.2

When trying to run below, I get this: (collision) sankets@lambda-dual:~/sg-collision-prediction/scripts$ python train_sg2vec.py --cache_path /home/sankets/roadscene2vec/examples/sg_extraction_output_1.pkl --batch_size 8 Traceback (most recent call last): File "train_sg2vec.py", line 3, in from sg_risk_assessment.sg2vec_trainer import SG2VECTrainer File "/home/sankets/sg-collision-prediction/sg_risk_assessment/sg2vec_trainer.py", line 16, in from sg_risk_assessment.mrgcn import * File "/home/sankets/sg-collision-prediction/sg_risk_assessment/mrgcn.py", line 6, in from torch_geometric.nn import RGCNConv, TopKPooling, FastRGCNConv File "/home/sankets/anaconda3/envs/collision/lib/python3.6/site-packages/torch_geometric/nn/init.py", line 3, in from .sequential import Sequential File "/home/sankets/anaconda3/envs/collision/lib/python3.6/site-packages/torch_geometric/nn/sequential.py", line 10, in from torch_geometric.nn.conv.utils.jit import class_from_module_repr File "/home/sankets/anaconda3/envs/collision/lib/python3.6/site-packages/torch_geometric/nn/conv/init.py", line 6, in from .gravnet_conv import GravNetConv File "/home/sankets/anaconda3/envs/collision/lib/python3.6/site-packages/torch_geometric/nn/conv/gravnet_conv.py", line 11, in from torch_cluster import knn File "/home/sankets/anaconda3/envs/collision/lib/python3.6/site-packages/torchcluster/init.py", line 15, in f'{library}{suffix}', [osp.dirname(file)]).origin) File "/home/sankets/anaconda3/envs/collision/lib/python3.6/site-packages/torch/_ops.py", line 105, in load_library ctypes.CDLL(path) File "/home/sankets/anaconda3/envs/collision/lib/python3.6/ctypes/init.py", line 348, in init self._handle = _dlopen(self._name, mode) OSError: /home/sankets/anaconda3/envs/collision/lib/python3.6/site-packages/torch_cluster/_version_cuda.so: undefined symbol: _ZN3c106detail12infer_schema20make_function_schemaENS_8ArrayRefINS111ArgumentDefEEES4

rusty1s commented 2 years ago

Why do you run

pip install torch==1.8.0
pip install torch==1.4.0
pip install torch==1.6.0

sequentially?

sleepymalc commented 2 years ago

I end up installing pytorch_geometric without torch-spline-conv

sanketsavla commented 2 years ago

Why do you run

pip install torch==1.8.0
pip install torch==1.4.0
pip install torch==1.6.0

sequentially?

Was doing trial error. Thought to keep you informed. Any suggestions?

rusty1s commented 2 years ago

Can you show the concrete steps that lead to this issue (at best with logs)? In your current setup, you are mixing up a lot of versions, e.g., it looks like you install our wheels for PyTorch 1.11 while making installing all kinds of different PyTorch versions afterwards (that's why I am confused). I suggest to restart a new conda environment and then simply run:

conda install pytorch==1.6.0 cudatoolkit=10.2 -c pytorch
pip install torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html

You can also try to install PyTorch 1.11 which should result in a smoother installation.