Laplacian Eigenvector PE is not deterministic

theodorju commented 1 year ago

Laplacian Eigenvector PE is not deterministic, i.e. it produces different encodings when applied to the same graph at different times even with everything seeded.

This is a code that reproduces it:

import torch
import random
import numpy as np
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import AddLaplacianEigenvectorPE

def seed_everything(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

seed_everything(42)
dataset = Planetoid(root='../datasets', name='Cora')
data = dataset[0]

lpe1 = AddLaplacianEigenvectorPE(k=124, attr_name="lpe1")
lpe2 = AddLaplacianEigenvectorPE(k=124, attr_name="lpe2")

# add lpe1
data = lpe1(data)
# add lpe2
data = lpe2(data)

# get pe
pe1 = data.lpe1
pe2 = data.lpe2

print(torch.allclose(pe1, pe2))  # prints False

# print for manual inspection
print(pe1[:5, :10])
print(pe2[:5, :10])

Environment

PyG version: 2.3.1
PyTorch version: 2.0.1+cu118
OS: Linux (Ubuntu 20.04)
Python version: 3.10
CUDA/cuDNN version: 12.0
How you installed PyTorch and PyG (conda, pip, source): pip.
Any other relevant information (e.g., version of torch-scatter): None.

rusty1s commented 1 year ago

I spent some time looking into this, but I don't have a good answer yet. scipy.eigs/scipy.eigsh uses ARPACK internally, which is generally not that good at finding small eigenvalues. It also will create a random starting vector which will prevent determinism. The original code uses numpy for eigenvector computation, see https://github.com/graphdeeplearning/benchmarking-gnns/blob/b6c407712fa576e9699555e1e035d1e327ccae6c/data/CSL.py#L205, which does not have these problems but is way slower in general. I am not sure if we should swap to this by default due to memory constraints. Maybe we can make it an option in the transform?

theodorju commented 1 year ago

Hi @rusty1s, thanks for looking into this. Making it an option in the transform (maybe with a user warning regarding execution time overhead) sounds like a good idea to me.

rusty1s commented 1 year ago

I agree. Do you wanna give it a try? Otherwise, I can take a look. Just let me know.

theodorju commented 1 year ago

Sure, I could give it a try.

pyg-team / pytorch_geometric

Laplacian Eigenvector PE is not deterministic #7499

Environment