Closed HughBlayney closed 2 years ago
This seems to be an issue with isolated nodes. In particular, you may want to pass the num_nodes_dict
argument to the MetaPath2Vec
model.
This seems to be an issue with isolated nodes. In particular, you may want to pass the
num_nodes_dict
argument to theMetaPath2Vec
model.
I had a similar issue. And I agree it should be an issue with isolated nodes. However, setting the num_nodes_dict
argument does not solve the problem. Anyone has a better idea?
Do you have a small example to reproduce?
Hi @rusty1s, I have created a HeteroData
with following statistics:
HeteroData(
(node_type_A, relation_A, node_type_A)={ edge_index=[2, 9000000] },
(node_type_A, relation_B, node_type_A)={ edge_index=[2, 18000000] }
)
After passing this graph into a metapath2vec
model, it correctly identifies the number of nodes: model.num_nodes_dict={'node_type_A': 5000000}
. However, the training procedure corrupts and reports the same IndexError
.
I am sure that some nodes included in relation_B
do not have any relation_A
, is this the reason causing the IndexError
, since metapath2vec
works fine if only one meta path was passed into the model.
I think MetaPath2Vec
should well be able to handle nodes with zero out-going edges. Any chance you have a small example to reproduce?
Hi @rusty1s , similar problem appeared when I test with my dataset. And I try to build a toy project which can help you to reproduce and know my problem. The project is in https://github.com/Amayama/pyg_error_toy Thanks for your help!
Thank you. This helps a lot. The issue is that your graph contains isolated nodes, so that random walk generation fails. I'm not yet sure how to fix this without introducing a lot of computational overhead, but I'm looking into it. In particular, in your example, most nodes are isolated, and as a result, random-walk based learning methods cannot give you meaningful embeddings in the first place.
Currently, torch_geometric.transforms.remove_isolated_nodes
cannot properly handle the heterogeneous graph, right?
Sadly not yet, and it does not really resolve this issue, as there might be nodes that are only isolated for a few edge types, while they are connected to some nodes for other edge types. I'm trying to fix this directly in MetaPath2Vec
.
Should be fixed when installing from master, see https://github.com/pyg-team/pytorch_geometric/pull/3353. Closing this issue now. Feel free to re-open it in case you meet any issues.
🐛 Bug
Hi,
I'm getting an IndexError when training MetaPath2Vec on my own dataset. The stack trace is
IndexError: Caught IndexError in DataLoader worker process 4. Original Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch_geometric/nn/models/metapath2vec.py", line 157, in sample return self.pos_sample(batch), self.neg_sample(batch) File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch_geometric/nn/models/metapath2vec.py", line 123, in pos_sample batch = adj.sample(num_neighbors=1, subset=batch).squeeze() File "/home/ubuntu/anaconda3/envs/GNN2/lib/python3.7/site-packages/torch_sparse/sample.py", line 22, in sample return col[rand] IndexError: index 1549811 is out of bounds for dimension 0 with size 1549811
From what I understand, it looks like the final entry in the
rowptr
tensor insample
is being referenced, which is an index out of bounds for thecol
tensor (as it is equal to the length of thecol
tensor). However, it looks like this doesn't happen on the default AMiner dataset, despite the fact that thesubset
tensor is a subset of a larger tensor in which the maximum value would index the final value inrowptr
. Therefore I think I'm misunderstanding part of the code, so any help would be very much appreciated.Reproducing the behaviour is complicated because I can't get the error to occur on the AMiner dataset, and I'm unable to share the dataset I'm working with. If it would be helpful for me to report back any metrics, or the results of any functions on my dataset, please let me know and I'll do what I can.
Thank you very much for your time, and for putting together such a fantastic library!
Environment