AnotherAvenger commented 1 year ago

import pandas as pd import numpy as np import torch from torch_geometric.data import Data from sklearn.preprocessing import LabelEncoder import torch_geometric.transforms as T from torch_geometric.nn import GCNConv import torch.nn.functional as F

Load the data

x = pd.read_csv('tempdataset.csv', index_col=0) edge_index = pd.read_csv('tempdatasetsourceandtarget.csv', index_col=0) y = pd.read_excel('y.xlsx')

Drop unnecessary columns from x

x = x.drop(columns=['ID', 'Name', 'Screen Name', 'Location'])

Convert data to numpy arrays

x = x.to_numpy() edge_index = edge_index.to_numpy() y = y.to_numpy().squeeze()

Encode edge_index columns as integers

le = LabelEncoder() row, col = edge_index.T le.fit(np.concatenate((row, col))) row = le.transform(row) col = le.transform(col)

Convert data to tensors

x = torch.tensor(x, dtype=torch.float) edge_index = torch.tensor(np.stack([row, col], axis=0)) y = torch.tensor(y)

Create a PyTorch Geometric Data object

data = Data(x=x, edge_index=edge_index, y=y)

Data(x=[114, 4], edge_index=[2, 114], y=[114])

Split the data into train, val, and test sets

data = T.RandomNodeSplit(num_val=0.2, num_test=0.4)(data)

Set device

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Define the GCN model

class GCN(torch.nn.Module): def init(self, in_channels, hidden_channels, out_channels): super().init() self.conv1 = GCNConv(in_channels, hidden_channels, cached=True) self.conv2 = GCNConv(hidden_channels, out_channels, cached=True)

def forward(self, x, edge_index, edge_weight=None):
    x = F.dropout(x, p=0.5, training=self.training)
    x = self.conv1(x, edge_index, edge_weight).relu()
    x = F.dropout(x, p=0.5, training=self.training)
    x = self.conv2(x, edge_index, edge_weight)
    return x

Move model and data to the designated device

model = GCN(data.x.size(-1), 64, data.y.max().item()+1) model, data = model.to(device), data.to(device)

Set up the optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

Define the training and evaluation functions

def train(): model.train() optimizer.zero_grad() out = model(data.x, data.edge_index, data.edge_weight) loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask]) loss.backward() optimizer.step() return float(loss)

@torch.no_grad() def test(): model.eval() pred = model(data.x, data.edge_index, data.edge_weight).argmax(dim=-1)

accs = []
for mask in [data.train_mask, data.val_mask, data.test_mask]:
    accs.append(int((pred[mask] == data.y[mask]).sum()) / int(mask.sum()))
return accs

Train and evaluate the model

for epoch in range(1, 101): loss = train() accs = test() print(f'Epoch {epoch}: loss = {loss:.4f}, acc = {accs[0]:.4f}, val_acc = {accs[1]:.4f}, test_acc = {accs[2]:.4f}')

i used a dummy dataset to test my gcn model. x consists features. edge_index defines the edges between nodes. y specifies whether the user is humar or bot. the above is my code and this is the error i am not able to solve RuntimeError: index 114 is out of bounds for dimension 0 with size 114 any suggestions here?

AnotherAvenger commented 1 year ago

@rusty1s @EdisonLeeeee @LeoGori any help with the solution

AnotherAvenger commented 1 year ago

tempdataset_1.csv tempdatasetsourceandtarget_1.csv y.xlsx this is the dummy dataset i have been using ...help me with the solution guys

rusty1s commented 1 year ago

I answered you in https://github.com/pyg-team/pytorch_geometric/discussions/5519#discussioncomment-4578868. Let's continue the discussion there, and let's close the issue here :)

AnotherAvenger commented 1 year ago

sir @rusty1s .. any problem with the dataset i am using ?

rusty1s commented 1 year ago

Can we move the discussion to PyG? I answered you there.

AnotherAvenger commented 1 year ago

@rusty1s i am sry . i forgot.

tkipf / gcn

PROBLEM WITH CUSTOM DATASET WHILE WORKING WITH GCN #216

Load the data

Drop unnecessary columns from x

Convert data to numpy arrays

Encode edge_index columns as integers

Convert data to tensors

Create a PyTorch Geometric Data object

Data(x=[114, 4], edge_index=[2, 114], y=[114])

Split the data into train, val, and test sets

Set device

Define the GCN model

Move model and data to the designated device

Set up the optimizer

Define the training and evaluation functions

Train and evaluate the model