pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.16k stars 3.64k forks source link

Error when using underscore "_" in node name #8974

Open OliEfr opened 7 months ago

OliEfr commented 7 months ago

🐛 Describe the bug

When creating a heterogenous graph and the corresponding GNN like that:

# imports ...

class ModelNetwork(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), 32)
        self.fc1 = Linear(-1, 2)

    def forward(self, x, edge_index):
        x = torch.relu(self.conv1(x, edge_index))
        x = self.fc1(x)
        return x

model = ModelNetwork()

data_list = []
data = HeteroData()
data["n_0"].x = torch.randn(1,8)
data["n_0"].y = torch.randn(1,2)
data["n_1"].x = torch.randn(1,8)
data["n_1"].y = torch.randn(1,2)
data["n_2"].x = torch.randn(1,8)
data["n_2"].y = torch.randn(1,2)
data["n_0", "e01", "n_1"].edge_index = torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
data["n_1", "e12", "n_2"].edge_index = torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
data["n_1", "e10", "n_0"].edge_index = torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
data["n_2", "e21", "n_1"].edge_index = torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
data_list.append(data)
data_list.append(data)
data_list.append(data)

dataloader = DataLoader(data_list, batch_size=1, )

model = ModelNetwork()
batch = next(iter(dataloader))
model = to_hetero(model, batch.metadata(), aggr="sum")

and then running a forward pass

with torch.no_grad():
    out = model(data_list[0].x_dict, data_list[0].edge_index_dict)

It throws the error

Exception has occurred: TypeError
add(): argument 'input' (position 1) must be Tensor, not NoneType
  File "/home/.../model_learner_mujoco_GNN_hetero.py", line 229, in <module>
    out = model(data_list[0].x_dict, data_list[0].edge_index_dict)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: add(): argument 'input' (position 1) must be Tensor, not NoneType

Traceback (most recent call last):
  File "/home/.../lib/python3.12/site-packages/torch/fx/graph_module.py", line 304, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.../lib/python3.12/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.../lib/python3.12/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.1", line 26, in forward
    conv1__n_0 = torch.add(None, None)
                 ^^^^^^^^^^^^^^^^^^^^^
TypeError: add(): argument 'input' (position 1) must be Tensor, not NoneType

Call using an FX-traced Module, line 26 of the traced Module's generated forward function:
    conv1__n_4 = torch.add(conv1__n_11, conv1__n_12);  conv1__n_11 = conv1__n_12 = None
    conv1__n_0 = torch.add(None, None)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    relu__n_0 = torch.relu(conv1__n_0);  conv1__n_0 = None

    relu__n_1 = torch.relu(conv1__n_1);  conv1__n_1 = None

I traced the issue and it is due to including "_" in the node names. Maybe leave a comment regarding that in the docs or fix.

Versions

PyTorch version: 2.2.0 Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

Versions of relevant libraries: [pip3] numpy==1.26.3 [pip3] pytorch-lightning==2.2.0.post0 [pip3] torch==2.2.0 [pip3] torch_geometric==2.5.0 [pip3] torch-tb-profiler==0.4.3 [pip3] torchmetrics==1.3.1 [pip3] torchvision==0.17.0 [pip3] triton==2.2.0 [conda] blas 1.0 mkl
[conda] cpuonly 2.0 0 pytorch [conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py312h5eee18b_1
[conda] mkl_fft 1.3.8 py312h5eee18b_0
[conda] mkl_random 1.2.4 py312hdb19cb5_0
[conda] numpy 1.26.3 py312hc5e2394_0
[conda] numpy-base 1.26.3 py312h0da6c21_0
[conda] pyg 2.5.0 py312_torch_2.2.0_cpu pyg [conda] pytorch 2.2.0 cpu_py312hb9e5694_0
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch [conda] pytorch-lightning 2.2.0.post0 pypi_0 pypi [conda] pytorch-mutex 1.0 cpu pytorch [conda] torch-tb-profiler 0.4.3 pypi_0 pypi [conda] torchmetrics 1.3.1 pypi_0 pypi [conda] torchvision 0.17.0 pypi_0 pypi [conda] triton 2.2.0 pypi_0 pypi

I choose not to disclose other system information.

rusty1s commented 7 months ago

This is definitely unintended, but I cannot reproduce this - runs fine on my end. The only difference in setup is that I am running on Python 3.10.

OliEfr commented 7 months ago

That's interesting. I am sure that it is due to the underscores on my end. I try to provide additional info if I have time. Please leave the issue open.

OliEfr commented 7 months ago

I'm back, thanks for you patience. Indeed, the MVP I provided originally runs without error. I am sorry about that.

I encountered, however, more issues when using underscore "_" in the name of nodes. For instance the following throws the error that I originally mentioned. When I remove the underscores in the naming it works. I couldn't trace the error to anything else. I can also provide you with other problems that arise when using underscore in the naming if you want.

Note: If you have any improvement suggestions for my code and using PyG, let me know. I know this way of creating a graph dataset is unusual, but required for my application.

import numpy as np
import torch
from torch_geometric.nn import  to_hetero, SAGEConv, Linear
from torch_geometric.data import  HeteroData
from torch_geometric.loader import DataLoader

class ModelNetwork(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), 32)
        self.fc1 = Linear(32, 7)

    def forward(self, x, edge_index):
        x = torch.relu(self.conv1(x, edge_index))
        x = self.fc1(x)
        return x

data_list_train = []
for i in range(10): # create ten training graphs
    data = HeteroData()
    for node_j in range(6):  # each graph has five nodes
        data["n_{}".format(node_j)].x = torch.randn((1,8)) # each node has 8 features
        data["n_{}".format(node_j)].y = torch.randn((1,2)) # each node has 2 labels
    # create edges. Each node of a different node type starts with index 0 again.
    data["n_0", "n_01", "n_1"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_1", "n_12", "n_2"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_0", "n_03", "n_3"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_3", "n_34", "n_4"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_4", "n_45", "n_5"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_1", "n_10", "n_0"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_2", "n_21", "n_1"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_3", "n_30", "n_0"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_4", "n_43", "n_3"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data["n_5", "n_54", "n_4"].edge_index = (
        torch.tensor([[0, 0]], dtype=torch.long).t().contiguous()
    )
    data_list_train.append(data)

trainloader = DataLoader(data_list_train, batch_size=256,)
batch = next(iter(trainloader))

model = ModelNetwork()
model = to_hetero(model, batch.metadata(), aggr="mean")

# Initialize lazy modules
with torch.no_grad():
    out = model(data_list_train[0].x_dict, data_list_train[0].edge_index_dict)
Exception has occurred: TypeError
add(): argument 'input' (position 1) must be Tensor, not NoneType
  File "/home/oliver/phd_repos/test_model_learner/test_removeable.py", line 67, in <module>
    out = model(data_list_train[0].x_dict, data_list_train[0].edge_index_dict)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: add(): argument 'input' (position 1) must be Tensor, not NoneType