pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.52k stars 3.57k forks source link

RandomNodeLoader Unequal number of nodes in each batch #9403

Open GARV-k opened 3 weeks ago

GARV-k commented 3 weeks ago

🐛 Describe the bug

For the following code: for split in range(splits): print(f"for loop in split_{split+1 }:") data_pass = Data(x=data_obj.x, edge_index = data_obj.edge_index, num_classes=max(data_obj.y).item() + 1, num_features = data_obj.x.shape[1], y=data_obj.y, train_mask=data_obj.train_mask[:,split], test_mask=data_obj.test_mask[:,split])

#loader = GraphSAINTRandomWalkSampler(data_pass, batch_size=batch_size, walk_length=walk_length,
       #                          num_steps=num_steps, sample_coverage=sample_coverage)
loader = RandomNodeLoader(data_pass,10)
#loader = ShaDowKHopSampler(data_obj, depth=2, num_neighbors=5,
                         #    node_idx=data_obj.train_mask)

# Usage
#loader = FixedSizeNodeLoader(data_pass, batch_size=760, shuffle=True)
print(data_pass.train_mask.sum()+data_pass.test_mask.sum())
print(data_pass.num_nodes)
for data in loader:
    num_nodes = data.num_nodes
    break
for idx, data in enumerate(loader): 
    adj_t = adj_t = to_dense_adj(data.edge_index,data.edge_weight)
    print(f"for {idx +1} batch the shape of adj matrix is"+ str(adj_t.shape))

The output is : Device: cuda:0 Optimization started.... for loop in split_1: tensor(5168) 7600 for 1 batch the shape of adj matrix istorch.Size([1, 760, 760]) for 2 batch the shape of adj matrix istorch.Size([1, 760, 760]) for 3 batch the shape of adj matrix istorch.Size([1, 751, 751]) for 4 batch the shape of adj matrix istorch.Size([1, 760, 760]) for 5 batch the shape of adj matrix istorch.Size([1, 759, 759]) for 6 batch the shape of adj matrix istorch.Size([1, 760, 760]) for 7 batch the shape of adj matrix istorch.Size([1, 758, 758]) for 8 batch the shape of adj matrix istorch.Size([1, 759, 759]) for 9 batch the shape of adj matrix istorch.Size([1, 758, 758]) for 10 batch the shape of adj matrix istorch.Size([1, 760, 760])

My doubt : if the total number of nodes is 7600 and num_batches = 10 then each of these adj matrix shape should contain 760 nodes right. But why isn't it the case. Let me know if any other info is required.

P.S : There was this warning in the output although I don't think this will affect anything : home/iplab/.local/lib/python3.10/site-packages/torch_geometric/typing.py:63: UserWarning: An issue occurred while importing 'torch-scatter'. Disabling its usage. Stacktrace: /home/iplab/.local/lib/python3.10/site-packages/torch_scatter/_version_cuda.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev warnings.warn(f"An issue occurred while importing 'torch-scatter'. " /home/iplab/.local/lib/python3.10/site-packages/torch_geometric/typing.py:101: UserWarning: An issue occurred while importing 'torch-sparse'. Disabling its usage. Stacktrace: /home/iplab/.local/lib/python3.10/site-packages/torch_sparse/_version_cuda.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev warnings.warn(f"An issue occurred while importing 'torch-sparse'. "

Versions

Pytorch = 2.3 Ubuntu = 22.04

rusty1s commented 2 weeks ago

What does

for data in loader:
    print(data.num_nodes)

return? I would expect that one adjacency matrix reports a smaller number of nodes due to isolated nodes.