tkipf / gcn

Implementation of Graph Convolutional Networks in TensorFlow
MIT License
7.05k stars 1.99k forks source link

PROBLEM WITH CUSTOM DATASET WHILE WORKING WITH GCN #216

Closed AnotherAvenger closed 1 year ago

AnotherAvenger commented 1 year ago

import pandas as pd import numpy as np import torch from torch_geometric.data import Data from sklearn.preprocessing import LabelEncoder import torch_geometric.transforms as T from torch_geometric.nn import GCNConv import torch.nn.functional as F

Load the data

x = pd.read_csv('tempdataset.csv', index_col=0) edge_index = pd.read_csv('tempdatasetsourceandtarget.csv', index_col=0) y = pd.read_excel('y.xlsx')

Drop unnecessary columns from x

x = x.drop(columns=['ID', 'Name', 'Screen Name', 'Location'])

Convert data to numpy arrays

x = x.to_numpy() edge_index = edge_index.to_numpy() y = y.to_numpy().squeeze()

Encode edge_index columns as integers

le = LabelEncoder() row, col = edge_index.T le.fit(np.concatenate((row, col))) row = le.transform(row) col = le.transform(col)

Convert data to tensors

x = torch.tensor(x, dtype=torch.float) edge_index = torch.tensor(np.stack([row, col], axis=0)) y = torch.tensor(y)

Create a PyTorch Geometric Data object

data = Data(x=x, edge_index=edge_index, y=y)

Data(x=[114, 4], edge_index=[2, 114], y=[114])

Split the data into train, val, and test sets

data = T.RandomNodeSplit(num_val=0.2, num_test=0.4)(data)

Set device

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Define the GCN model

class GCN(torch.nn.Module): def init(self, in_channels, hidden_channels, out_channels): super().init() self.conv1 = GCNConv(in_channels, hidden_channels, cached=True) self.conv2 = GCNConv(hidden_channels, out_channels, cached=True)

def forward(self, x, edge_index, edge_weight=None):
    x = F.dropout(x, p=0.5, training=self.training)
    x = self.conv1(x, edge_index, edge_weight).relu()
    x = F.dropout(x, p=0.5, training=self.training)
    x = self.conv2(x, edge_index, edge_weight)
    return x

Move model and data to the designated device

model = GCN(data.x.size(-1), 64, data.y.max().item()+1) model, data = model.to(device), data.to(device)

Set up the optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

Define the training and evaluation functions

def train(): model.train() optimizer.zero_grad() out = model(data.x, data.edge_index, data.edge_weight) loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask]) loss.backward() optimizer.step() return float(loss)

@torch.no_grad() def test(): model.eval() pred = model(data.x, data.edge_index, data.edge_weight).argmax(dim=-1)

accs = []
for mask in [data.train_mask, data.val_mask, data.test_mask]:
    accs.append(int((pred[mask] == data.y[mask]).sum()) / int(mask.sum()))
return accs

Train and evaluate the model

for epoch in range(1, 101): loss = train() accs = test() print(f'Epoch {epoch}: loss = {loss:.4f}, acc = {accs[0]:.4f}, val_acc = {accs[1]:.4f}, test_acc = {accs[2]:.4f}')

i used a dummy dataset to test my gcn model. x consists features. edge_index defines the edges between nodes. y specifies whether the user is humar or bot. the above is my code and this is the error i am not able to solve RuntimeError: index 114 is out of bounds for dimension 0 with size 114 any suggestions here?

AnotherAvenger commented 1 year ago

@rusty1s @EdisonLeeeee @LeoGori any help with the solution

AnotherAvenger commented 1 year ago

tempdataset_1.csv tempdatasetsourceandtarget_1.csv y.xlsx this is the dummy dataset i have been using ...help me with the solution guys

rusty1s commented 1 year ago

I answered you in https://github.com/pyg-team/pytorch_geometric/discussions/5519#discussioncomment-4578868. Let's continue the discussion there, and let's close the issue here :)

AnotherAvenger commented 1 year ago

sir @rusty1s .. any problem with the dataset i am using ?

rusty1s commented 1 year ago

Can we move the discussion to PyG? I answered you there.

AnotherAvenger commented 1 year ago

@rusty1s i am sry . i forgot.