pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.51k stars 3.69k forks source link

PyG Dependencies (torch-scatter, torch-sparse, etc.) are incompatible with conda-forge distribution of PyTorch #9191

Closed alexbarghi-nv closed 7 months ago

alexbarghi-nv commented 7 months ago

🐛 Describe the bug

RAPIDS is trying to expand our CI coverage for the GNN libraries, including cugraph-pyg and wholegraph. We are running into an issue where the preferred PyTorch distribution from the conda-forge channel is not compatible with PyG's dependencies. An example error is below (occurs on import of torch-cluster):

2024-04-10T23:45:26.6544343Z     from cugraph_pyg.nn import GATv2Conv as CuGraphGATv2Conv
2024-04-10T23:45:26.6545955Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/cugraph_pyg/nn/__init__.py:14: in <module>
2024-04-10T23:45:26.6547236Z     from .conv import *
2024-04-10T23:45:26.6548627Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/cugraph_pyg/nn/conv/__init__.py:14: in <module>
2024-04-10T23:45:26.6549978Z     from .gat_conv import GATConv
2024-04-10T23:45:26.6551436Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/cugraph_pyg/nn/conv/gat_conv.py:19: in <module>
2024-04-10T23:45:26.6552782Z     from .base import BaseConv
2024-04-10T23:45:26.6554181Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/cugraph_pyg/nn/conv/base.py:21: in <module>
2024-04-10T23:45:26.6555635Z     torch_geometric = import_optional("torch_geometric")
2024-04-10T23:45:26.6557285Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/cugraph/utilities/utils.py:455: in import_optional
2024-04-10T23:45:26.6558688Z     return importlib.import_module(mod)
2024-04-10T23:45:26.6559881Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/importlib/__init__.py:127: in import_module
2024-04-10T23:45:26.6561249Z     return _bootstrap._gcd_import(name[level:], package, level)
2024-04-10T23:45:26.6563170Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/__init__.py:6: in <module>
2024-04-10T23:45:26.6564652Z     import torch_geometric.datasets
2024-04-10T23:45:26.6566196Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/datasets/__init__.py:100: in <module>
2024-04-10T23:45:26.6567765Z     from .explainer_dataset import ExplainerDataset
2024-04-10T23:45:26.6569466Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/datasets/explainer_dataset.py:8: in <module>
2024-04-10T23:45:26.6571031Z     from torch_geometric.explain import Explanation
2024-04-10T23:45:26.6572645Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/explain/__init__.py:3: in <module>
2024-04-10T23:45:26.6574033Z     from .algorithm import *  # noqa
2024-04-10T23:45:26.6575640Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/explain/algorithm/__init__.py:1: in <module>
2024-04-10T23:45:26.6577121Z     from .base import ExplainerAlgorithm
2024-04-10T23:45:26.6578941Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/explain/algorithm/base.py:14: in <module>
2024-04-10T23:45:26.6580556Z     from torch_geometric.nn import MessagePassing
2024-04-10T23:45:26.6582309Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/nn/__init__.py:5: in <module>
2024-04-10T23:45:26.6583856Z     from .to_hetero_with_bases_transformer import to_hetero_with_bases
2024-04-10T23:45:26.6585790Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/nn/to_hetero_with_bases_transformer.py:9: in <module>
2024-04-10T23:45:26.6587428Z     from torch_geometric.nn.conv import MessagePassing
2024-04-10T23:45:26.6589053Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/nn/conv/__init__.py:8: in <module>
2024-04-10T23:45:26.6590462Z     from .gravnet_conv import GravNetConv
2024-04-10T23:45:26.6592049Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_geometric/nn/conv/gravnet_conv.py:12: in <module>
2024-04-10T23:45:26.6593474Z     from torch_cluster import knn
2024-04-10T23:45:26.6594878Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_cluster/__init__.py:18: in <module>
2024-04-10T23:45:26.6596191Z     torch.ops.load_library(spec.origin)
2024-04-10T23:45:26.6597587Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch/_ops.py:852: in load_library
2024-04-10T23:45:26.6599058Z     ctypes.CDLL(path)
2024-04-10T23:45:26.6600015Z /opt/conda/envs/test_cugraph_pyg/lib/python3.9/ctypes/__init__.py:374: in __init__
2024-04-10T23:45:26.6601181Z     self._handle = _dlopen(self._name, mode)
2024-04-10T23:45:26.6603217Z E   OSError: /opt/conda/envs/test_cugraph_pyg/lib/python3.9/site-packages/torch_cluster/_version_cuda.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs

This error does not occur when using PyTorch from the pytorch channel, or building the PyG dependencies from source. However, we can't do source builds in our CI environment. For now, we are reverting to testing with the pytorch channel, but generally, RAPIDS prefers to use the conda-forge version which is synchronized with our other dependencies.

Is there any plan to support conda-forge PyTorch? Other libraries like DGL support conda-forge PyTorch, so it would be very helpful to us if PyG could do the same.

Versions

pytorch: 2.1.2 libtorch: 2.1.2 cuda: 11.8 pyg: 2.4.0 pyg-lib: 0.4.0+pt21cu118 torch_cluster: 1.6.3+pt21cu118 torch_scatter: 2.1.2+pt21cu118 torch_sparse: 0.6.18+pt21cu118 torch_spline_conv: 1.2.2+pt21cu118

rusty1s commented 7 months ago

AFAIK, there exists conda-forge packages as well for these dependencies, but they are not maintained by us, see https://github.com/conda-forge/torch-scatter-feedstock.

alexbarghi-nv commented 7 months ago

I'll look into that and see if that resolves this. Thanks.

alexbarghi-nv commented 7 months ago

This has been resolved.