rusty1s / pytorch_sparse

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations
MIT License
1.01k stars 147 forks source link

undefined symbol #261

Closed ZhiZhongWan closed 1 year ago

ZhiZhongWan commented 2 years ago

My env:

$ pip list|grep torch
torch                              1.9.1+cu111
torch-cluster                      1.6.0
torch-geometric                    2.0.4
torch-scatter                      2.0.9
torch-sparse                       0.6.14
torch-spline-conv                  1.2.1
torchaudio                         0.9.1
torchvision                        0.10.1+cu111

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0

My cuda ver is 11.1 and the following errors occured:

Traceback (most recent call last):
  File "train_transductive.py", line 13, in <module>
    from bgrl import *
  File "/home/chjiang/bgrl/bgrl/__init__.py", line 4, in <module>
    from .models import GCN, GraphSAGE_GCN
  File "/home/chjiang/bgrl/bgrl/models.py", line 3, in <module>
    from torch_geometric.nn import BatchNorm, GCNConv, LayerNorm, SAGEConv, Sequential
  File "/home/chjiang/anaconda3/lib/python3.8/site-packages/torch_geometric/__init__.py", line 4, in <module>
    import torch_geometric.data
  File "/home/chjiang/anaconda3/lib/python3.8/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
    from .data import Data
  File "/home/chjiang/anaconda3/lib/python3.8/site-packages/torch_geometric/data/data.py", line 9, in <module>
    from torch_sparse import SparseTensor
  File "/home/chjiang/anaconda3/lib/python3.8/site-packages/torch_sparse/__init__.py", line 19, in <module>
    torch.ops.load_library(spec.origin)
  File "/home/chjiang/anaconda3/lib/python3.8/site-packages/torch/_ops.py", line 104, in load_library
    ctypes.CDLL(path)
  File "/home/chjiang/anaconda3/lib/python3.8/ctypes/__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/chjiang/anaconda3/lib/python3.8/site-packages/torch_sparse/_convert_cuda.so: undefined symbol: _ZNK2at6Tensor6deviceEv

I tried torch_sparse==0.6.12 but no help.

Could you tell me how to fix this problem?

rusty1s commented 2 years ago

How did you install torch-sparse? What‘s the installation log when running via pip install —verbose?

ZhiZhongWan commented 2 years ago

How did you install torch-sparse? What‘s the installation log when running via pip install —verbose?

If pip not using cache, the output is very very long so I post the output that useing cache. This is what I got:

$ pip install --verbose torch_sparse==0.6.12
Using pip 22.1.2 from /home/chjiang/anaconda3/envs/bgrl/lib/python3.8/site-packages/pip (python 3.8)
Collecting torch_sparse==0.6.12
  Using cached torch_sparse-0.6.12-cp38-cp38-linux_x86_64.whl
Requirement already satisfied: scipy in /home/chjiang/anaconda3/envs/bgrl/lib/python3.8/site-packages (from torch_sparse==0.6.12) (1.8.1)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /home/chjiang/anaconda3/envs/bgrl/lib/python3.8/site-packages (from scipy->torch_sparse==0.6.12) (1.23.1)
Installing collected packages: torch_sparse
Successfully installed torch_sparse-0.6.12
rusty1s commented 2 years ago

Actually, I‘ll need the full log without caching. Also, what happens if you install from wheel?


pip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
ZhiZhongWan commented 2 years ago

Actually, I‘ll need the full log without caching. Also, what happens if you install from wheel?

pip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html

I followed your advice and here's the log I got:

$ pip install --verbose torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html --no-cache-dir
Using pip 22.1.2 from /home/chjiang/anaconda3/envs/bgrl/lib/python3.8/site-packages/pip (python 3.8)
Looking in links: https://data.pyg.org/whl/torch-1.9.0+cu111.html
Collecting torch-sparse==0.6.12
  Downloading https://data.pyg.org/whl/torch-1.9.0%2Bcu111/torch_sparse-0.6.12-cp38-cp38-linux_x86_64.whl (3.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.7/3.7 MB 3.0 MB/s eta 0:00:00
Requirement already satisfied: scipy in /home/chjiang/anaconda3/envs/bgrl/lib/python3.8/site-packages (from torch-sparse==0.6.12) (1.8.1)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /home/chjiang/anaconda3/envs/bgrl/lib/python3.8/site-packages (from scipy->torch-sparse==0.6.12) (1.23.1)
Installing collected packages: torch-sparse
Successfully installed torch-sparse-0.6.12

But pity that the error still occurs... It seems that there's some conflicts between torch_sparse and cuda version?

rusty1s commented 2 years ago

What‘s your local CUDA version (nvcc -v)?

ZhiZhongWan commented 2 years ago

What‘s your local CUDA version (nvcc -v)?

The CUDA version is 11.1, everything seems OK...

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
rusty1s commented 2 years ago

Ah right, you posted it earlier already, I missed it. At this point it is hard to tell why it fails on your end. The error you posted definitely refers to some version mismatch in libraries. I personally recommend to confirm that there are no version conflicts by starting from a fresh conda environment where all you do is installing PyTorch and torch-scatter.

ZhiZhongWan commented 2 years ago

Ah right, you posted it earlier already, I missed it. At this point it is hard to tell why it fails on your end. The error you posted definitely refers to some version mismatch in libraries. I personally recommend to confirm that there are no version conflicts by starting from a fresh conda environment where all you do is installing PyTorch and torch-scatter.

I tried but still failed... Thanks for your patience, I'll post the procedure so maybe you or someone could reproduce this error in some day.

I follow the instruction of building env in here. After building virtual environment:

pip install torch==1.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install absl-py==0.12.0 tensorboard==2.6.0 ogb

then command python3 train_transductive.py --flagfile=config/coauthor-cs.cfg leads to error.

rusty1s commented 2 years ago

Please specify a version of the packages according to the latest versions in https://data.pyg.org/whl/torch-1.9.0+cu111.html, e.g., torch-sparse==0.6.12.

ZhiZhongWan commented 2 years ago

Please specify a version of the packages according to the latest versions in https://data.pyg.org/whl/torch-1.9.0+cu111.html, e.g., torch-sparse==0.6.12.

IT WORKS! I use following command lines:

pip install torch-scatter==2.0.9 torch-sparse==0.6.12 torch-cluster==1.5.9 torch-spline-conv==1.2.1 torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cu111.html

Though still confused about why it does not work as expected if not specify a version at first and re-install via specifing a version, many thanks to your patience and advice.

rusty1s commented 2 years ago

Without a version identifier, the package will try to install its latest version from source (which seems to fail in your case).

qm-intel commented 2 years ago

I am facing the similar issue:

Traceback (most recent call last):
  File "/home/es/PycharmProjects/3-Meta-MGNN-tox/main.py", line 13, in <module>
    from meta_model import Meta_model
  File "/home/es/PycharmProjects/3-Meta-MGNN-tox/meta_model.py", line 6, in <module>
    from model import GNN, GNN_graphpred
  File "/home/es/PycharmProjects/3-Meta-MGNN-tox/model.py", line 3, in <module>
    from torch_geometric.nn import MessagePassing
  File "/home/es/anaconda3/envs/pyg-meta/lib/python3.7/site-packages/torch_geometric/__init__.py", line 4, in <module>
    import torch_geometric.data
  File "/home/es/anaconda3/envs/pyg-meta/lib/python3.7/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
    from .data import Data
  File "/home/es/anaconda3/envs/pyg-meta/lib/python3.7/site-packages/torch_geometric/data/data.py", line 20, in <module>
    from torch_sparse import SparseTensor
  File "/home/es/anaconda3/envs/pyg-meta/lib/python3.7/site-packages/torch_sparse/__init__.py", line 19, in <module>
    torch.ops.load_library(spec.origin)
  File "/home/es/anaconda3/envs/pyg-meta/lib/python3.7/site-packages/torch/_ops.py", line 255, in load_library
    ctypes.CDLL(path)
  File "/home/es/anaconda3/envs/pyg-meta/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/es/anaconda3/envs/pyg-meta/lib/python3.7/site-packages/torch_sparse/_spmm_cuda.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE

@rusty1s I did installation on a fresh conda environment

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install torch-geometric

The followings is the information of virtual environment:

`python -c 'from torch.utils.collect_env import main; main()'
Collecting environment information...
PyTorch version: 1.12.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17

Python version: 3.7.13 (default, Mar 29 2022, 02:18:16)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.0-50-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 850M
Nvidia driver version: 515.65.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.0+cu113
[pip3] torch-geometric==2.1.0.post1
[pip3] torch-scatter==2.0.9
[pip3] torch-sparse==0.6.15
[pip3] torchaudio==0.12.0+cu113
[pip3] torchvision==0.13.0+cu113
[conda] blas                      1.0                         mkl  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py37h7f8727e_0  
[conda] mkl_fft                   1.3.1            py37hd3c417c_0  
[conda] mkl_random                1.2.2            py37h51133e4_0  
[conda] numpy                     1.21.6                   pypi_0    pypi
[conda] numpy-base                1.21.5           py37ha15fc14_3  
[conda] torch                     1.12.0+cu113             pypi_0    pypi
[conda] torch-geometric           2.1.0.post1              pypi_0    pypi
[conda] torch-scatter             2.0.9                    pypi_0    pypi
[conda] torch-sparse              0.6.15                   pypi_0    pypi
[conda] torchaudio                0.12.0+cu113             pypi_0    pypi
[conda] torchvision               0.13.0+cu113             pypi_0    pypi
`

How to resolve this issue?

rusty1s commented 2 years ago

I think the issue is that you are using PyTorch 1.12 while you are installing the wheels for PyTorch 1.11. Can you confirm?

jliellen commented 2 years ago

Hi @rusty1s, I had very hard time installing torch-sparse==0.6.12. Below is my command which I ran under conda env:

$ pip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.14.0+cu116.html

I've also tried the following: $ pip install --verbose torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html --no-cache-dir It successfully installed but when I ran my code, it gave me segmentation errors. So I don't think it's the right way to solve the problem?

My env: - GCC: 7.5.0 - NVCC: Cuda compilation tools, release 11.6, V11.6.112 Build cuda_11.6.r11.6/compiler.30978841_0 - PyTorch: 1.14.0.dev20221011+cu116 - PyTorch CUDA: 11.6

Attached is my error log. error.log.txt

Could you please take a look at it? Thank you so much in advance.

rusty1s commented 2 years ago

There does not exist a torch-1.14.0 version. PyTorch 1.13 wheels will be provided soon. For installations of earlier PyTorch releases such as 1.9.0, you can try the --no-index option for a smoother installation:

pip install --no-index torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
zerofishnoodles commented 1 year ago

Hi @rusty1s,

I am facing the same issue.

If I use the official torch, It can run smoothly, however, if I build the same version of torch from source, the undefined symbol (_version_cuda.so) problem raises.

Would you mind taking a look at it?

Here is the compile log for the pytorch-1.9.0 compilelog.txt

rusty1s commented 1 year ago

If you build PyTorch from source, you also have to build this extension from source. The pre-compiled wheels assume usage of official PyTorch versions.

github-actions[bot] commented 1 year ago

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?