pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.5k stars 3.69k forks source link

`RandomLinkSplit` causes data leakage when using bipartite undirected graph #9425

Open sadrahkm opened 5 months ago

sadrahkm commented 5 months ago

🐛 Describe the bug

I am working on a task in which I have two types of nodes and the edges are only association, so it is considered a bipartite graph. I want this graph to be undirected so the message passing can be done in both directions. But I recently noticed that the documentation has mentioned that is_undirected option doesn't work when we have a bipartite graph, Did I understand this right?

If I am correct, so the example written in this blog post would be wrong. Because in that example, there is exactly a similar situation as mine (undirected bipartite graph), and the is_undirected=True cannot be used to avoid data leakage. If so, is there any way to fix this issue?

I would appreciate if you clarify since this I believe this is an important problem.

Versions

PyTorch version: 2.2.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 12 (bookworm) (x86_64) GCC version: (Debian 12.2.0-14) 12.2.0 Clang version: Could not collect CMake version: version 3.25.1 Libc version: glibc-2.36

Python version: 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] (64-bit runtime) Python platform: Linux-6.1.0-21-amd64-x86_64-with-glibc2.36 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A16 GPU 1: NVIDIA A16 GPU 2: NVIDIA A16 GPU 3: NVIDIA A16

Nvidia driver version: 525.147.05 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True ...

rusty1s commented 5 months ago

For heterogeneous graphs, data leakage is prevent via specifying "reverse" edge types:

edge_types=("user", "rates", "movie"),
rev_edge_types=("movie", "rev_rates", "user")

This makes sure that links are eliminated in the reverse edge type as well.