pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.54k stars 3.57k forks source link

NeighborSampler samples different nodes with disjoint=True and False #9375

Open Barcavin opened 1 month ago

Barcavin commented 1 month ago

🐛 Describe the bug

import argparse

from torch_geometric.datasets import FakeDataset, FakeHeteroDataset
from torch_geometric.loader import NeighborLoader
from torch_geometric.seed import seed_everything

parser = argparse.ArgumentParser()
parser.add_argument('--disjoint', type=int, default=0)
args = parser.parse_args()
seed_everything(0)

data = FakeHeteroDataset(1, avg_degree=600)[0]

print(data['v0'].x.mean())
loader = NeighborLoader(
            data,
            # Sample 30 neighbors for each node for 2 iterations
            num_neighbors=[30] * 2,
            # Use a batch size of 128 for sampling training nodes
            batch_size=128,
            input_nodes=('v0',[1]),
            disjoint=args.disjoint,
)
print(next(iter(loader))['v0'].n_id.unique().sum())

Run in the terminal:

python disjoint.py --disjoint=1

tensor(-0.0022)
tensor(72436)
python disjoint.py --disjoint=0

tensor(-0.0022)
tensor(71816)

On the same datasets, it will sample two different node sets. I only see this happens on heterogeneous data but not on homogeneous data.

Versions

PyTorch version: 2.0.1 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.35

Python version: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] (64-bit runtime) Python platform: Linux-4.14.343-260.564.amzn2.x86_64-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A10G Nvidia driver version: 535.129.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Vendor ID: AuthenticAMD Model name: AMD EPYC 7R32 CPU family: 23 Model: 49 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 Stepping: 0 BogoMIPS: 5599.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save rdpid Hypervisor vendor: KVM Virtualization type: full L1d cache: 512 KiB (16 instances) L1i cache: 512 KiB (16 instances) L2 cache: 8 MiB (16 instances) L3 cache: 64 MiB (4 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-31 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Vulnerable, RAS-Poisoning: Vulnerable Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] pytorch_frame==0.2.2 [pip3] torch==2.0.1 [pip3] torch_geometric==2.5.2 [pip3] torch-hd==5.5.0 [pip3] torch-scatter==2.1.2 [pip3] torch-sparse==0.6.18 [pip3] torchaudio==2.0.2 [pip3] torchvision==0.15.2 [pip3] triton==2.0.0 [conda] blas 2.121 mkl conda-forge [conda] blas-devel 3.9.0 21_linux64_mkl conda-forge [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] libblas 3.9.0 21_linux64_mkl conda-forge [conda] libcblas 3.9.0 21_linux64_mkl conda-forge [conda] liblapack 3.9.0 21_linux64_mkl conda-forge [conda] liblapacke 3.9.0 21_linux64_mkl conda-forge [conda] mkl 2024.0.0 ha957f24_49657 conda-forge [conda] mkl-devel 2024.0.0 ha770c72_49657 conda-forge [conda] mkl-include 2024.0.0 ha957f24_49657 conda-forge [conda] numpy 1.26.4 py311h64a7726_0 conda-forge [conda] pyg 2.5.2 py311_torch_2.0.0_cu118 pyg [conda] pytorch 2.0.1 py3.11_cuda11.8_cudnn8.7.0_0 pytorch [conda] pytorch-cuda 11.8 h7e8668a_5 pytorch [conda] pytorch-frame 0.2.2 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] pytorch-scatter 2.1.2 py311_torch_2.0.0_cu118 pyg [conda] pytorch-sparse 0.6.18 py311_torch_2.0.0_cu118 pyg [conda] torch-hd 5.5.0 pypi_0 pypi [conda] torchaudio 2.0.2 py311_cu118 pytorch [conda] torchtriton 2.0.0 py311 pytorch [conda] torchvision 0.15.2 py311_cu118 pytorch

rusty1s commented 3 weeks ago

I would say that is expected. Although you fix the random seed, sampling is performed differently for disjoint=True/False. You can see that the same number of nodes are sampled if you change to num_neighbors=[-1] * 2.

Barcavin commented 3 weeks ago

Thanks for your reply. However, the sampling seems to be more consistent on homogeneous data. Is that also expected somehow on heterogeneous data?

rusty1s commented 3 weeks ago

What do you mean by "more consistent"?

Barcavin commented 3 weeks ago

If we sample on homogeneous graph, the sampled data for either disjoint=True or False will be the same.

To reproduce, running python disjoint.py --disjoint=0 --heter=0 and python disjoint.py --disjoint=1 --heter=0 will give the same result.

disjoint.py:

import argparse

from torch_geometric.datasets import FakeDataset, FakeHeteroDataset
from torch_geometric.loader import NeighborLoader
from torch_geometric.seed import seed_everything

parser = argparse.ArgumentParser()
parser.add_argument('--disjoint', type=int, default=0)
parser.add_argument('--heter', type=int, default=1)
args = parser.parse_args()
seed_everything(0)

if args.heter:
    data = FakeHeteroDataset(1, avg_degree=600)[0]

    print(data['v0'].x.mean())
    loader = NeighborLoader(
                data,
                # Sample 30 neighbors for each node for 2 iterations
                num_neighbors=[30] * 2,
                # Use a batch size of 128 for sampling training nodes
                batch_size=128,
                input_nodes=('v0',[1]),
                disjoint=args.disjoint,
    )
    print(next(iter(loader))['v0'].n_id.unique().sum())
else:
    data = FakeDataset(1, avg_degree=600)[0]

    print(data.x.mean())
    loader = NeighborLoader(
                data,
                # Sample 30 neighbors for each node for 2 iterations
                num_neighbors=[30] * 2,
                # Use a batch size of 128 for sampling training nodes
                batch_size=128,
                input_nodes=[1],
                disjoint=args.disjoint,
    )
    print(next(iter(loader)).n_id.unique().sum())
rusty1s commented 2 weeks ago

Thanks. I don't necessarily think this is a problem, but I will try to find some time to look into why this is the case.