Open RostyslavUA opened 3 years ago
My guess is that this comes from re-initializing the optimizer
in every training run. Can you try to move that call outside the first for
loop?
Thank you for the answer!
Even after I move the optimizer
outside the first for
loop, I get the following results
The loss still jumps back to large value...
my training function is
def train(data, model):
model.train()
optimizer.zero_grad()
out = model(data.x, data.edge_index, data.edge_attr)
loss = criterion(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
return loss
any my model is
from torch_geometric.nn import TransformerConv
class Transf(torch.nn.Module):
def __init__(self, data, hidden_channels):
super(Transf, self).__init__()
torch.manual_seed(12345)
self.conv1 = TransformerConv(data.num_features, hidden_channels, edge_dim=2)
self.conv2 = TransformerConv(hidden_channels, num_classes, edge_dim=2)
def forward(self, x, edge_index, edge_attr):
x = self.conv1(x, edge_index, edge_attr)
x = x.relu()
x = self.conv2(x, edge_index, edge_attr)
return x
I see. To me, this indicates that your network indeed just heavily overfits on one single graph, and cannot transfer the knowledge to other graphs. What happens when you train your network with randomly sampled graphs from your training set, instead of one-after-one?
I randomly take 5 Data
object from the list
from random import sample
randomly_sampled_data = sample(data_list, 5)
And the result of the training is unfortunately the same the same
I thought that the problem of overfitting is due to the fact that I use all of the nodes in the graph for training. So now I have set the 25 % of the nodes in the graph for training. I also reduced the number of epochs to 50 and set the learning rate 5...10 smaller than the previous one. Surprisingly, my curves look differently
However, there is still something wrong, since it looks like the untrained model performs better than the trained one...
I'm not sure I understand. What I mean is that currently you iterate over each graph, and train each graph in isolation. Instead, it's more reasonable to do something like this:
loader = DataLoader(data_list, batch_size=1, shuffle=True)
for epoch in range (1, train_epoch+1):
for data in loader:
loss = train(data, modelGraphConv)
Ah, I see! So let's say that data_list
contains 10 graphs. What I did is the following:
from torch_geometric.data import DataLoader
loader = DataLoader(data_list, batch_size = 1, shuffle=True)
modelGraphConv = GraphConvClass(data, hidden_channels=16)
train_epoch = 200
loss_arr = np.zeros((len(data_list), train_epoch), dtype = float)
optimizer = torch.optim.Adam(modelGraphConv.parameters(), lr= 0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range (1, train_epoch+1):
k = 0
for data in loader:
loss = train(data, modelGraphConv)
loss_arr[k, epoch-1] = loss
k +=1
for i in range (len(data_list)):
plt.plot(np.arange(train_epoch), loss_arr[i, :])
plt.show()
So essentially the loss of each graph in the loader
is saved along the rows of the matrix loss_arr
. The operator in this particular case is GraphConv
. Note that here I use 1 dimensional edge attribute.The result of is as follows
in parallel, I also train the TransformerConv
model, were I use the same number of epochs and the same settings of the optimizer (learning rate, weight decay), but 2 dimensional edge attribute. The result is
Loss goes down much better, but how could we interpret this?
Yes, this looks more reasonable. What do you mean with how one can interpret this?
Sorry, I forgot to mention this.
So what we did now is we have avoided overfitting on one particular graph, so that the model can make better predictions over the entire dataset. My problem is that even when I manage to bring the loss down, the accuracy remains very low. And this is reasonable, since my embedding space does not look well-classified.
Let me clarify this farther with the example of TransformerConv
which is defined as follows
class Transf(torch.nn.Module):
def __init__(self, data, hidden_channels):
super(Transf, self).__init__()
torch.manual_seed(12345)
self.conv1 = TransformerConv(data.num_features, hidden_channels, edge_dim=2)
self.conv2 = TransformerConv(hidden_channels, hidden_channels, edge_dim=2)
self.conv3 = TransformerConv(hidden_channels, num_classes, edge_dim=2)
def forward(self, x, edge_index, edge_attr):
x = self.conv1(x, edge_index, edge_attr)
x = x.relu()
x = self.conv2(x, edge_index, edge_attr)
x = x.relu()
x = self.conv3(x, edge_index, edge_attr)
return x
Note that I have added one additional layer in comparison to the models that I mentioned earlier. Then I instantiate the class and depict the embeddings
modelTransf = Transf(data, hidden_channels=64)
outTransf = modelTransf(data.x, data.edge_index, data.edge_attr)
visualize(outTransf, color=data.y)
Before training, my embedding looks like this
then with the following parameters, through the training I manage to bring the loss down
modelTransf = Transf(data, hidden_channels=64)
train_epoch = 200
loss_arr = np.zeros((len(data_list), train_epoch), dtype = float)
optimizer = torch.optim.Adam(modelTransf.parameters(), lr= 0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range (1, train_epoch+1):
k = 0
for data in loader:
loss = train(data, modelTransf)
loss_arr[k, epoch-1] = loss
k +=1
for i in range (len(data_list)):
plt.plot(np.arange(train_epoch), loss_arr[i, :])
plt.show()
which looks like
However, when I test the accuracy
accuracy, outTransf = test(data_test, modelTransf)[0:2]
print(f'Accuracy is {accuracy:.3f} ')
visualize(outTransf, color = data_test.y)
it remains in the range of 20 %
, which is very low. And the reason can be seen in the embeddings after training, which is depicted below
From the figure above we can see that the nodes belonging to the same class are not classified well.
Note that the accuracy always remains in the vicinity of 20 %
and is not impacted by the value of loss. Which means that for the loss of 0.4
and 1.0
, the accuracy remains similar.
This also means to me that even when we avoid overfitting on one particular graph, the model cannot generalize for the whole dataset.
I have tried to solve this problem by:
GCNConv, GraphConv
Thank you!
I'm not sure whether this is a problem in your model (at least your code looks correct to me). How does the training accuracy look like? I'm not sure I can give you any advice, since I do not know about your data. This might be a pre-processing problem as well, in which nodes have false connections.
I have modified my code a little bit, so that now I can check the accuracy of the training and testing:
def train(loader, model):
model.train()
for data in loader:
optimizer.zero_grad()
out = model(data.x, data.edge_index, data.edge_attr)
loss = criterion(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
def test(loader, model):
model.eval()
total_nodes = 0
correct = 0
for data in loader:
out = model(data.x, data.edge_index, data.edge_attr)
pred = out.argmax(dim=1)
correct += int((pred == data.y).sum())
total_nodes += data.num_nodes
return correct/total_nodes
and now when I train and test
modelTransf = Transf(data, hidden_channels=64)
train_epoch = 200
train_acc = []
test_acc = []
optimizer = torch.optim.Adam(modelTransf.parameters(), lr= 0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range (1, train_epoch+1):
train(loader, modelTransf)
train_acc.append(test(loader, modelTransf))
test_acc.append(test(loader_test, modelTransf))
plt.plot(np.arange(train_epoch), np.array(train_acc), label='Training')
plt.plot(np.arange(train_epoch), np.array(test_acc), label='Test')
plt.legend()
plt.show()
my result for 15 graphs for training and 15 graphs for testing is
when I set 75 graphs for training and 23 graphs for testing, the resulting curves look like in the following picture
Regarding the false connections of the nodes, this does not seem to be the problem. I verify it in the following way
data = data_list[0] # select one graph from the list
data_net = to_networkx(data)
[n for n in data_net[0]]
which returns me the neighbors of the particular node. I compare it then it with my original dataset and it matches. Therefore I think, that the nodes have the correct connections. Or did you mean something different?
Thank you!
This is interesting, the model does not seem to be able to generalize at all :( I sadly don't have any good advice on this one. What happens when you drop edge_index
completely, e.g., replacing all GNN layers with PyTorch Linear
layers? Does that increase test accuracy?
I did it in the following way: Modify training and testing functions
def train(loader, model):
model.train()
for data in loader:
optimizer.zero_grad()
out = model(data.x) # drop edges
loss = criterion(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
def test(loader, model):
model.eval()
total_nodes = 0
correct = 0
for data in loader:
out = model(data.x) # drop edges
pred = out.argmax(dim=1)
correct += int((pred == data.y).sum())
total_nodes += data.num_nodes
return correct/total_nodes
Create the model
from torch.nn import Linear
class Lin(torch.nn.Module):
def __init__(self, data, hidden_channels):
super().__init__()
torch.manual_seed(12345)
self.lin1 = Linear(data.num_features, hidden_channels)
self.lin2 = Linear(hidden_channels, num_classes)
def forward(self, x):
x = self.lin1(x)
x = x.relu()
x = self.lin2(x)
return x
then train
modelLin = Lin(data, hidden_channels = 64)
train_epoch = 200
train_acc = []
test_acc = []
optimizer = torch.optim.Adam(modelLin.parameters(), lr= 0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range (1, train_epoch+1):
train(loader, modelLin)
train_acc.append(test(loader, modelLin))
print(f'epoch {epoch}')
print('Training is done!')
test_acc.append(test(loader_test, modelLin))
print('Tesing is done!')
print(f'Train accuracy is {train_acc[epoch-1]:.2f} and Test accuracy is {test_acc[epoch-1]:.2f}')
print('=====')
plt.plot(np.arange(train_epoch), np.array(train_acc), label='Training')
plt.plot(np.arange(train_epoch), np.array(test_acc), label='Test')
plt.legend()
plt.show()
and the model still cannot generalize ;(
What could be a reason? Maybe I need a richer feature vector for each node?
By the way, my Data
object looks like this:
Data(edge_attr=[822, 2], edge_index=[2, 822], test_mask=[563], train_mask=[563], val_mask=[563], x=[563, 70], y=[563])
and the feature of the node is one hot encoded: only two entries take the value 1
and all other are set to 0
.
It looks like test set features are always out-of-distribution. Mh, you can plot your node features via T-SNE to confirm. Maybe this gives you some more intuition on what's going wrong. Otherwise, I'm out of ideas :(
All right, so I have collapsed my 40 dimensional feature vector to 2D via T-SNE for one training graph and one test graph
data_emb = TSNE(n_components=2).fit_transform(data.x)
data_test_emb = TSNE(n_components=2).fit_transform(data_test.x)
and this is the result
If I can interpret the result correctly, it looks like the test set is not out-of-distribution, since both test and train data are distributed in a very similar manner.
Essentially, my node feature vector represents the coordinates in 2D, so before one-hot encoding (before bringing it up to 40 dimensions), it looks like this
In other words, picture above depicts the collected data.
At this moment, I cannot think about anything else I could try with the current dataset. If you got any other ideas by looking at my datasets regarding how to improve accuracy of the model or process/modify/analyze the data, I would widely appreciate if you shared. Otherwise, I thank you very much for helping me and we can close the thread.
Any reason why you convert your coordinates to a one-hot-encoding? If your graph is "spatial", then you can treat it as such in the GNN layer as well, e.g., by using SplineConv
.
From my observations, with one-hot encoding of the node features, I am able to bring the loss down. Usually, the higher the dimensionality of one-hot encoded feature, the steeper the loss decrease. The accuracy does not grow, though.
Thanks for the advice, SplineConv
seems more reasonable in this case. But the problem still remains: even though the training accuracy grows by few percent, the test accuracy does not.
Here is my code:
from torch_geometric.nn import SplineConv
class Spline(torch.nn.Module):
def __init__(self, data, hidden_channels):
super().__init__()
torch.manual_seed(12345)
self.spl1 = SplineConv(data.num_features, hidden_channels, dim = 1, kernel_size = 2)
self.spl2 = SplineConv(hidden_channels, num_classes, dim = 1, kernel_size = 2)
def forward(self, x, edge_index, edge_attr):
x = self.spl1(x, edge_index, edge_attr[:, 1][np.newaxis].T)
x = x.relu()
x = self.spl2(x, edge_index, edge_attr[:, 1][np.newaxis].T)
return x
For the pseudo-coordinates, I use the one dimensional edge attribute (normalized to the value between 0 and 1) that in the collected data represents the distance between the nodes.
train_epoch = 200
train_acc = []
test_acc = []
modelSpl = Spline(data, hidden_channels = 16)
optimizer = torch.optim.Adam(modelSpl.parameters(), lr = 0.01, weight_decay = 5e-4)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range(1, train_epoch+1):
train(loader, modelSpl)
train_acc.append(test(loader, modelSpl))
test_acc.append(test(loader_test, modelSpl))
plt.plot(np.arange(train_epoch), np.array(train_acc), label='Training')
plt.plot(np.arange(train_epoch), np.array(test_acc), label='Test')
plt.legend()
plt.show()
and the resulting accuracy plot is very similar
Meanwhile, with the kernel_size = 3
the loss remains on approx. 1.6 throughout the entire training.
I have also noticed that if I set the kernel_size
to some large value e.g. 100
then my loss goes down to approx 0.9
and the training accuracy grows to 40 %
. Test accuracy remains on 20 %
.
In addition, I have also tried to use 2 dimensional edge_attr
, change number of graphs in the loader
(from 15 to 75 in the training set and 5 to 15 in the test), different number of channels between and the result does not change much.
Do you know anything else I could try? Thank you!
For edge_attr
, you can directly make use of the 2D coordinates, and for features, you could try to go with just a single feature holding a 1 (similar to what we do in FAUST). This will drop absolute coordinate information, which might help the model to generalize better.
For some strange reason, my kernel dies when I set the kernel_size = 2
after I put 2D coordinates in edge_attr
. For kernel_size = 1
the kernel does not die, however, the accuracy is not improved and loss is approximately 1.6. Meanwhile, with TransformerConv
the kernel does not die (but again no improvement in accuracy). Let me explain it in more detail.
Now my edge_attr
looks like
tensor([[ 0.5662, 0.3955, 0.5494, 0.4265],
[-0.8367, -0.4174, -0.8214, -0.4449],
[-0.1971, 0.6470, -0.2018, 0.6349],
...,
[ 0.7983, 0.7754, 0.7710, 0.7770],
[-0.0177, 0.9949, -0.0338, 1.0000],
[-0.0150, 0.9933, -0.0338, 1.0000]])
where first 2 columns represent the coordinates of the source node, and last 2 columns - the coordinates of the destination node. The node feature is a 1 for all the nodes. My model is the same
from torch_geometric.nn import SplineConv
class Spline(torch.nn.Module):
def __init__(self, data, hidden_channels):
super().__init__()
torch.manual_seed(12345)
self.spl1 = SplineConv(data.num_features, hidden_channels, dim = 4, kernel_size = 2)
self.spl2 = SplineConv(hidden_channels, num_classes, dim = 4, kernel_size = 2)
def forward(self, x, edge_index, edge_attr):
x = self.spl1(x, edge_index, edge_attr)
x = x.relu()
x = self.spl2(x, edge_index, edge_attr)
return x
When I start training
train_epoch = 200
train_acc = []
test_acc = []
modelSpl = Spline(data, hidden_channels = 16)
optimizer = torch.optim.Adam(modelSpl.parameters(), lr = 0.01, weight_decay = 5e-4)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range(1, train_epoch+1):
train(loader, modelSpl)
train_acc.append(test(loader, modelSpl))
test_acc.append(test(loader_test, modelSpl))
plt.plot(np.arange(train_epoch), np.array(train_acc), label='Training')
plt.plot(np.arange(train_epoch), np.array(test_acc), label='Test')
plt.legend()
plt.show()
my loss either jumps to inf
or becomes nan
and the kernel dies (again, for kernel_size = 1
, the loss is 1.6 and it keeps going).
Note that I receive the warning when I create the model
UserWarning: We do not recommend using the non-optimized CPU version of `SplineConv`. If possible, please move your data to GPU.
warnings.warn(
but my gpu has no cuda, so I have to go with cpu anyway.
Such an unexpected problem :) Do you know what can cause this? Thank you very much!
The SplineConv
expects edge features to be in the interval [0, 1]
and I think this may cause this issue. Instead of inputting absolute coordinates as edge features, the idea in SplineConv
is that you input relative coordinates instead. You can do this via the T.Cartesian()
transform, e.g.:
data.pos = ... # Node positions
data = ToCartesian()(data)
conv= SplineConv(1, hidden_channels, dim = 2 kernel_size = 5)
It's a bummer that you do not have access to a GPU, since the SplineConv
is actually quite slow on CPU :(
All right, even though I did not increase my test accuracy yet, there is an interesting observation that I made. Generally speaking, I want to reduce the number of conflicting nodes (nodes that have the same class and are connected by an edge).
Once again, my Data
object is as follows
Data(edge_attr=[840, 2], edge_index=[2, 840], pos=[596, 2], test_mask=[596], train_mask=[596], val_mask=[596], x=[596, 1], y=[596])
where edge_attr
contain the relative coordinates of the nodes. calculated as you mentioned earlier.
x
contains 1
for each node.
Model
class Spline(torch.nn.Module):
def __init__(self, data, hidden_channels):
super().__init__()
torch.manual_seed(12345)
self.spl1 = SplineConv(data.num_features, hidden_channels, dim = 2, kernel_size = 5)
self.spl2 = SplineConv(hidden_channels, num_classes, dim = 2, kernel_size = 5)
def forward(self, x, edge_index, edge_attr):
x = self.spl1(x, edge_index, edge_attr)
x = x.relu()
x = self.spl2(x, edge_index, edge_attr)
return x
Training itself
train_epoch = 200
train_acc = []
test_acc = []
test_conf_acc = []
modelSpl = Spline(data, hidden_channels = 16)
optimizer = torch.optim.Adam(modelSpl.parameters(), lr = 0.01, weight_decay = 5e-4)
criterion = torch.nn.CrossEntropyLoss()
for epoch in range(1, train_epoch+1):
train(loader, modelSpl)
train_acc.append(test(loader, modelSpl))
test_acc.append(test(loader_test, modelSpl))
test_conf_acc.append(test_only_conf(loader_test, modelSpl)) # accuracy w.r.t. conflicting nodes
The test accuracy of the model with SplineConv
operator does not grow (maybe there is indeed some error when I preprocess the data, I am not sure), so I have decided to check if the model does that I want: reduction of the number of conflicting nodes.
I do it in the following way
def test_only_conf(loader, model):
model.eval()
total_edges = 0
correct = 0
clash = 0
for data in loader:
total_edges += data.num_edges
out = model(data.x, data.edge_index, data.edge_attr)
pred = out.argmax(dim=1)
data_net = to_networkx(data)
for j in range(len(data.x)):
neighb = [n for n in data_net[j]] # get the neighbors
node_pred = pred[j] # get the prediction of the node of interest
for m in range(len(neighb)):
if node_pred == pred[neighb[m]]: # check if the classes of the selected node and its neighbors' are the same
clash +=1
return 1-clash/(2*total_edges) # returns accuracy. 2 is due to visiting each node twice
and the result I obtain is
where accuracy w.r.t. conflicts I have marked Test-conflict
.
From the figure above we can observe that at the beginning of the training, that the model performs with Test-conflict
accuracy of 65 %
, then it grows to 90 %
.
So this is essentially what I need! However, the predictions of the model do not match the labels. I believe that this is still a problem. This is due to the fact that if we look at the embeddings of one of the graphs that I used for training, it looks like this
where I cannot see any similarity (e.g. distance) between the nodes that belong to the same class ;( Which also indicates that in crease of the Test-conflict
accuracy is rather something inherit to the ConvSpline
operator than something induced by my dataset.
What do you think about it?
Not sure I fully understand it yet. So equally labeled nodes have the same label, but their label does not match with the ground-truth label? I wonder whether there is any indication in the dataset, which nodes belongs to which label?
So equally labeled nodes have the same label, but their label does not match with the ground-truth label?
No, equally labeled nodes do not necessarily match the ground-truth label, however, during the training, the number of conflicting nodes reduces (a node pair that has the same class).
This comes from the fact, that the predictions at the beginning of the training are less diverse than towards the end of the training. Let me give you an example: at the 1st epoch, my predictions are
Predictions tensor([4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3, 3, 4, 3, 3, 4, 3, 3, 4, 3, 3, 3, 3, 3,
3, 3, 3, 4, 4, 3, 4, 3, 4, 3, 4, 3, 3, 4, 3, 3, 4, 3, 4, 3, 3, 3, 4, 4,
3, 4, 3, 4, 3, 4, 4, 3, 4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3,
3, 3, 4, 4, 4, 3, 4, 3, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 4,
4, 4, 4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3,
3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3,
4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 4, 3, 3, 3, 3, 3, 4, 4, 4,
3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3,
4, 3, 4, 3, 4, 4, 4, 4, 3, 3, 3, 3, 4, 4, 3, 3, 3, 4, 4, 4, 4, 3, 3, 4,
3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 4, 4, 3, 4, 3, 4, 3, 4, 3, 3, 4, 4, 3, 4, 4, 4, 3, 4, 3, 4, 4,
4, 4, 4, 3, 4, 3, 4, 3, 4, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 4, 4, 3, 3, 3,
4, 3, 3, 3, 4, 3, 4, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3,
3, 4, 3, 3, 3, 3, 4, 3, 4, 4, 3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 3, 3, 4, 3,
3, 3, 3, 3, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4, 4, 4,
4, 3, 3, 4, 3, 3, 4, 4, 3, 3, 3, 3, 3, 4, 3, 3, 4, 4, 3, 3, 3, 4, 3, 4,
3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3,
4, 3, 3, 4, 3, 3, 4, 4, 3, 3, 4, 4, 4, 3, 4, 3, 4, 3, 3, 4, 3, 4, 3, 4,
4, 4, 4, 3, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 4, 3, 4,
4, 3, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 4, 4,
4])
at the 10th epoch
Predictions tensor([4, 4, 0, 0, 4, 4, 4, 2, 4, 4, 4, 3, 4, 4, 1, 1, 0, 3, 4, 1, 3, 4, 0, 0,
0, 3, 1, 1, 4, 1, 4, 0, 4, 1, 0, 0, 0, 0, 0, 0, 1, 3, 4, 1, 2, 0, 4, 4,
0, 4, 4, 4, 0, 4, 4, 0, 4, 2, 4, 3, 4, 2, 1, 2, 1, 0, 4, 4, 4, 0, 3, 1,
1, 1, 2, 4, 4, 3, 1, 2, 2, 4, 0, 0, 0, 4, 2, 1, 3, 3, 1, 2, 0, 4, 0, 1,
4, 4, 4, 3, 1, 0, 2, 0, 0, 1, 3, 1, 4, 4, 3, 4, 3, 0, 0, 2, 4, 4, 2, 4,
3, 1, 0, 1, 0, 1, 3, 1, 1, 2, 4, 1, 0, 4, 1, 1, 4, 1, 4, 1, 3, 2, 1, 4,
3, 4, 4, 1, 1, 4, 3, 0, 3, 3, 4, 0, 3, 0, 2, 0, 3, 2, 3, 4, 1, 4, 3, 0,
4, 3, 1, 0, 4, 3, 2, 1, 3, 4, 4, 3, 4, 1, 1, 4, 0, 3, 0, 1, 3, 4, 4, 4,
3, 3, 3, 4, 4, 0, 1, 0, 0, 3, 1, 4, 1, 4, 4, 1, 3, 3, 4, 0, 1, 0, 1, 1,
4, 0, 1, 0, 4, 4, 4, 4, 0, 1, 3, 1, 4, 4, 1, 0, 0, 4, 4, 3, 0, 2, 4, 1,
1, 4, 1, 4, 1, 0, 3, 1, 0, 2, 3, 4, 0, 2, 4, 4, 2, 0, 0, 2, 1, 1, 0, 1,
4, 1, 2, 4, 4, 1, 4, 2, 4, 1, 4, 0, 1, 1, 4, 1, 4, 1, 4, 3, 4, 4, 1, 4,
4, 4, 1, 3, 4, 2, 4, 3, 0, 2, 0, 4, 3, 0, 4, 1, 3, 0, 4, 4, 1, 4, 0, 3,
1, 0, 1, 0, 4, 4, 1, 2, 1, 4, 2, 1, 3, 4, 1, 2, 0, 0, 2, 3, 1, 4, 3, 2,
0, 1, 3, 3, 1, 4, 4, 3, 4, 1, 3, 3, 4, 4, 1, 1, 0, 1, 1, 1, 0, 1, 4, 0,
2, 4, 1, 3, 4, 4, 0, 3, 4, 3, 1, 4, 0, 0, 1, 4, 4, 4, 1, 0, 3, 4, 1, 4,
4, 0, 2, 1, 1, 0, 4, 1, 0, 3, 4, 1, 0, 4, 4, 0, 4, 4, 3, 0, 4, 1, 4, 4,
4, 4, 0, 3, 3, 0, 0, 0, 1, 4, 1, 4, 0, 4, 0, 4, 4, 3, 3, 4, 3, 4, 1, 3,
2, 3, 4, 3, 0, 3, 1, 0, 1, 1, 4, 4, 0, 3, 3, 3, 1, 3, 3, 0, 4, 3, 0, 4,
4, 1, 3, 4, 1, 0, 4, 1, 3, 2, 4, 4, 4, 0, 4, 0, 4, 1, 4, 4, 3, 4, 3, 1,
4, 3, 1, 3, 1, 1, 1, 4, 1, 1, 1, 0, 3, 1, 0, 4, 4, 0, 2, 3, 4, 4, 0, 4,
4, 1, 2, 1, 4, 1, 3, 3, 4, 0, 4, 4, 4, 2, 3, 1, 2, 1, 3, 3, 4, 0, 2, 4,
4])
at the 100th epoch
Predictions tensor([4, 1, 1, 4, 3, 4, 4, 2, 4, 1, 4, 0, 1, 4, 1, 1, 0, 3, 4, 4, 3, 4, 0, 0,
0, 0, 3, 1, 3, 1, 2, 0, 4, 3, 4, 0, 0, 0, 1, 0, 1, 3, 4, 1, 2, 0, 2, 4,
0, 4, 4, 3, 0, 0, 4, 2, 1, 4, 3, 3, 4, 2, 1, 2, 1, 0, 3, 4, 4, 0, 3, 3,
1, 0, 2, 3, 4, 4, 1, 2, 2, 1, 0, 2, 0, 3, 2, 1, 3, 3, 1, 2, 0, 3, 2, 1,
4, 4, 4, 3, 1, 0, 2, 0, 1, 1, 3, 1, 0, 4, 0, 4, 4, 1, 0, 2, 4, 1, 2, 4,
3, 1, 0, 1, 3, 1, 3, 1, 1, 2, 4, 1, 0, 0, 3, 1, 0, 1, 1, 1, 3, 2, 1, 4,
3, 4, 0, 1, 0, 1, 3, 0, 0, 0, 4, 0, 3, 0, 2, 0, 3, 2, 3, 4, 1, 3, 0, 0,
4, 3, 1, 0, 4, 3, 2, 1, 3, 4, 4, 2, 4, 1, 1, 1, 0, 1, 0, 1, 3, 4, 4, 4,
3, 3, 1, 3, 3, 0, 1, 0, 0, 0, 3, 4, 1, 1, 4, 3, 1, 1, 2, 0, 2, 0, 1, 1,
4, 2, 1, 1, 4, 3, 1, 3, 0, 1, 0, 1, 4, 4, 1, 0, 0, 4, 4, 3, 0, 3, 4, 1,
1, 4, 2, 0, 1, 0, 3, 1, 0, 2, 3, 1, 0, 2, 4, 0, 2, 1, 0, 2, 1, 1, 0, 2,
4, 1, 2, 1, 3, 1, 4, 2, 2, 1, 4, 2, 1, 1, 4, 2, 4, 1, 3, 0, 1, 4, 1, 4,
4, 3, 1, 3, 4, 2, 3, 3, 0, 2, 0, 1, 2, 0, 4, 1, 3, 3, 2, 0, 1, 4, 0, 3,
1, 0, 1, 0, 4, 4, 1, 2, 4, 2, 2, 1, 3, 4, 1, 2, 3, 0, 2, 4, 1, 4, 3, 2,
3, 1, 4, 3, 0, 2, 4, 3, 1, 1, 2, 3, 4, 3, 1, 2, 2, 1, 1, 1, 0, 2, 4, 0,
2, 4, 3, 3, 3, 4, 0, 3, 1, 3, 3, 0, 0, 0, 1, 1, 0, 4, 0, 3, 0, 3, 4, 4,
4, 0, 2, 1, 1, 0, 4, 1, 2, 1, 4, 0, 0, 4, 0, 0, 4, 3, 1, 0, 4, 1, 4, 4,
4, 4, 0, 3, 3, 0, 0, 1, 1, 4, 1, 4, 0, 2, 2, 4, 4, 3, 3, 4, 3, 4, 3, 3,
1, 3, 0, 3, 0, 4, 1, 0, 1, 1, 4, 1, 0, 3, 3, 3, 1, 2, 3, 1, 4, 3, 0, 3,
1, 1, 0, 2, 1, 0, 4, 1, 3, 3, 3, 4, 4, 2, 4, 1, 0, 1, 2, 4, 3, 4, 3, 1,
4, 3, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 3, 3, 0, 4, 0, 0, 2, 4, 1, 4, 0, 4,
3, 1, 2, 0, 1, 4, 3, 3, 2, 0, 1, 3, 4, 4, 0, 1, 2, 1, 3, 3, 4, 0, 2, 4,
4])
As we see, at the beginning, the predictions are essentially 3
and 4
and the more we train, the more diverse the predictions become. This explains why we have more conflicting nodes at the beginning and less conflicting nodes at the end of the training.
Meanwhile, the ground-truth label has some value between 0
and 4
for each node.
I wonder whether there is any indication in the dataset, which nodes belongs to which label?
Yes, there are ground-truth labels in my dataset. The loss and the accuracy is calculated w.r.t. them.
Let me repeat, the objective is to obtain the least number of conflicting nodes (that what Test-conflict
accuracy is about). At the same time, there is a crucial point that I have been overlooking until now: there are multiple solutions possible and my ground-truth labels indicate one of the (sub)optimal solutions. So maybe the predictions of the model are not wrong - they just don't match the ground-truth labels! That is maybe the reason, why the usual accuracy is always at 20 %
.
The abovementioned idea is contradictory to the following one: I believe that the loss and the usual accuracy must still improve, since I indicate by the ground-truth labels what is the solution. This indeed is the solution the model must approach as close as possible.
From the last two paragraphs, can you say where I am right? And in general what would be the operators that would help me to find the optimal solution?
Thanks a lot!
I see. I don't think a classic node classification criterion is ideal in this scenario, as you correctly pointed out that your labels are only one valid solution. I would frame this problem as a contrastive learning/link prediction, that is, you want nodes with the same label to have high probability, and nodes with different labels to have low probability.
you want nodes with the same label to have high probability, and nodes with different labels to have low probability.
I guess it is other way around :) nodes with the same label must have low probability and nodes with different labels - high probability. I may be wrong about it, but I still think it is a different problem from edge-predictions: in literature my problem is usually referred as graph-coloring problem (reduce the number of nodes that have the same color).
Thus, node classification still makes more sense to me. Also, at the input I am already given a graph. In other words, the edge between the nodes is already predefined.
There are multiple correct ways to "color" the graph (to classify the nodes) and my dataset indicates one out of many solutions. My solution (labels) are likely optimal and I want to train model against them.
Eventually, I want to feed into the trained model a non-colored graph and at the output get the colored graph.
Or I just didn't get your idea of edge-predictions correctly?
You are right that you can cast this problem as a node classification problem, but as you correctly identified the ground-truth information is only a single solution to your problem. In the end, the machine learning model can not really learn the correct "color" of nodes as it is arbitrary. As such, you need to train your model independent of its specific ground-truth color. The best way in doing so is via contrastive loss, i.e., separate the nodes with the same label from all other nodes. This is equivalent in doing link prediction, i.e., find the nodes that are (dis-)connected.
Okay! I am already working on this. We can close the thread now.
Thank you very much for your help!
As you suggested, I have reformulated the problem to the link prediction frame and the testing accuracy seems to be higher. However, I am not sure how to correctly extract the predicted links. It confuses me that the model predicts many more edges that actually are in the Data
object and accuracy is still high.
In particular, the data object at the beginning has 10556
edges: Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])
then we split the edges
data = train_test_split_edges(data)
and add negative edges by means of negative_sampling
.
Now, the training/testing/validation performance is shown below
Since the test accuracy is very high, I expect that the number and connectivity of predicted links is very similar to the Data
object given as an input.
To extract those edges, I do
z = model.encode(data.x, data.train_pos_edge_index)
final_edge_index = model.decode_all(z)
and when I check the number of predicted edges
print(final_edge_index.size(1))
I get 3465922
which is much larger than the number of edges of the input graph. I do not understand, how with such a large difference of the edge numbers, the accuracy remains at 90 %
.
So my question is: how do I correctly extract the predicted edges? Or if these predictions are correct, how do I make the model predict only missing edges?
Thank you!
The final_edge_index
will use a threshold of 0.5 to decide whether to include an edge or not, which might be to low in your experiment. To only keep edges with higher probability, run:
prob_adj = z @ z.t().sigmoid()
return (prob_adj > threshold).nonzero(as_tuple=False).t()
From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.
❓ Questions & Help
I am developing a model for the node classification task. I batch multiple graphs into the training and testing batches . After I train the model against one batch, I obtain some result that seem suspicious to me.
Let us say, my batch that contains the nodes for the training is given as follows:
Batch(batch=[5811], edge_attr=[8340, 1], edge_index=[2, 8340], ptr=[11], test_mask=[5811], train_mask=[5811], val_mask=[5811], x=[5811, 40], y=[5811])
It contains 10 graphs as can be seen inptr
.Next I train the model:
and the result of the first three trainings is depicted below
Let us ignore the high loss for now. The thing that confuses me the most, is that at the beginning of each training, the loss jumps back to the value approximately 2. I would expect it continuously going down (or at least remain at the same level), since the multiple graphs that I use for training comes from the same simulation.
So the question is: do I make a mistake in the programming, or is it my misunderstanding of the neural network performance?
Thank you!