Closed matteoTaiana closed 3 years ago
That's actually hard for me to track down. I guess this is caused by inplace modifying updated_edge_attr
. To test, you can replace this call with torch.index_put
.
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity.
Hi everyone,
I am using the
scatter_mean()
function to update the embeddings of edges in a Graph Neural Network (I am using PyTorch Geometric for implementing the GNN). Things work fine for a variable number of epochs, then I get an error. I instrumented the code with the following instruction, so that error reporting is more informative:torch.autograd.set_detect_anomaly(True)
This is the error I get:
In summary, this is the error:
RuntimeError: Function 'torch::autograd::CopySlices' returned nan values in its 1th output.
And this is the instruction (executed during the forward pass) that leads to the error happening during the
backward()
function:updated_edge_attr[cum_edges[g_id]:cum_edges[g_id+1], :] = scatter_mean(single_updates, dim=0, index=b)
I don't understand the error message. The error happens while running the backward function, so while computing gradients. The function CopySlices seems to be simply selecting which part of the output array the output of
scatter_mean()
gets copied to, so the local gradient should be 1. Could this be due to thescatter_mean()
function? Could this error be due to me writing several times toupdated_edge_attr
? Is that an in-place operation?Thank you in advance for your help!