Closed AnotherAvenger closed 1 year ago
@rusty1s @EdisonLeeeee @LeoGori any help with the solution
tempdataset_1.csv tempdatasetsourceandtarget_1.csv y.xlsx this is the dummy dataset i have been using ...help me with the solution guys
I answered you in https://github.com/pyg-team/pytorch_geometric/discussions/5519#discussioncomment-4578868. Let's continue the discussion there, and let's close the issue here :)
sir @rusty1s .. any problem with the dataset i am using ?
Can we move the discussion to PyG? I answered you there.
@rusty1s i am sry . i forgot.
import pandas as pd import numpy as np import torch from torch_geometric.data import Data from sklearn.preprocessing import LabelEncoder import torch_geometric.transforms as T from torch_geometric.nn import GCNConv import torch.nn.functional as F
Load the data
x = pd.read_csv('tempdataset.csv', index_col=0) edge_index = pd.read_csv('tempdatasetsourceandtarget.csv', index_col=0) y = pd.read_excel('y.xlsx')
Drop unnecessary columns from x
x = x.drop(columns=['ID', 'Name', 'Screen Name', 'Location'])
Convert data to numpy arrays
x = x.to_numpy() edge_index = edge_index.to_numpy() y = y.to_numpy().squeeze()
Encode edge_index columns as integers
le = LabelEncoder() row, col = edge_index.T le.fit(np.concatenate((row, col))) row = le.transform(row) col = le.transform(col)
Convert data to tensors
x = torch.tensor(x, dtype=torch.float) edge_index = torch.tensor(np.stack([row, col], axis=0)) y = torch.tensor(y)
Create a PyTorch Geometric Data object
data = Data(x=x, edge_index=edge_index, y=y)
Data(x=[114, 4], edge_index=[2, 114], y=[114])
Split the data into train, val, and test sets
data = T.RandomNodeSplit(num_val=0.2, num_test=0.4)(data)
Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Define the GCN model
class GCN(torch.nn.Module): def init(self, in_channels, hidden_channels, out_channels): super().init() self.conv1 = GCNConv(in_channels, hidden_channels, cached=True) self.conv2 = GCNConv(hidden_channels, out_channels, cached=True)
Move model and data to the designated device
model = GCN(data.x.size(-1), 64, data.y.max().item()+1) model, data = model.to(device), data.to(device)
Set up the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
Define the training and evaluation functions
def train(): model.train() optimizer.zero_grad() out = model(data.x, data.edge_index, data.edge_weight) loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask]) loss.backward() optimizer.step() return float(loss)
@torch.no_grad() def test(): model.eval() pred = model(data.x, data.edge_index, data.edge_weight).argmax(dim=-1)
Train and evaluate the model
for epoch in range(1, 101): loss = train() accs = test() print(f'Epoch {epoch}: loss = {loss:.4f}, acc = {accs[0]:.4f}, val_acc = {accs[1]:.4f}, test_acc = {accs[2]:.4f}')
i used a dummy dataset to test my gcn model. x consists features. edge_index defines the edges between nodes. y specifies whether the user is humar or bot. the above is my code and this is the error i am not able to solve RuntimeError: index 114 is out of bounds for dimension 0 with size 114 any suggestions here?