pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.28k stars 3.65k forks source link

torch_sparse.SparseTensor.size causing problems in Data and graphSAINT #1589

Closed pavlin-policar closed 4 years ago

pavlin-policar commented 4 years ago

🐛 Bug

It appears that torch_sparse.SparseTensor causes problems when calling torch_geometric.data.Data.num_nodes. I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-7db7380be762> in <module>
----> 1 data.num_nodes

~/local/miniconda3/envs/gnn/lib/python3.8/site-packages/torch_geometric/data/data.py in num_nodes(self)
    201             return self.__num_nodes__
    202         for key, item in self('x', 'pos', 'norm', 'batch'):
--> 203             return item.size(self.__cat_dim__(key, item))
    204         if hasattr(self, 'adj'):
    205             return self.adj.size(0)

~/local/miniconda3/envs/gnn/lib/python3.8/site-packages/torch_sparse/tensor.py in size(self, dim)
    212 
    213     def size(self, dim: int) -> int:
--> 214         return self.sizes()[dim]
    215 
    216     def dim(self) -> int:

TypeError: list indices must be integers or slices, not tuple

I would like to use this because my graph nodes do not have features, so I did the standard thing and put in an identity matrix. The graphs are pretty big, so I want to use sparse matrices here, otherwise, I'll run out of GPU memory pretty quick. The same error occurs in graphSAINT, because it tries to access data.num_nodes.

I'm pretty new to this field and torch_geometric in general, so I was surprised this wasn't working and that this issue doesn't seem to have been reported before. Am I using this incorrectly, or is this something that just isn't supported yet?

To Reproduce

import torch
import torch_sparse
import torch_geometric as pyg

edge_index = torch.tensor([
    [1, 0, 3, 1, 2, 0],
    [0, 1, 1, 3, 0, 2],
])

num_nodes = len(edge_index.unique())

x = torch_sparse.SparseTensor.eye(num_nodes)

data = pyg.data.Data(edge_index=edge_index, x=x)

data.num_nodes  # causes error

Environment

pavlin-policar commented 4 years ago

It is also worth mentioning that I had run graphSAINT successfully using torch.sparse.FloatTensor instead of torch_sparse.SparseTensor, in which case getting num_nodes works, but in that case graphSAINT failed because torch.sparse.FloatTensor doesn't support proper indexing. I implemented a fairly simple workaround by monkey-patching GraphSAINTSampler.__collate__ to properly index sparse matrices.

However, I'm now playing around with RGCNs, where torch.sparse.FloatTensor raises errors, and there doesn't appear to be a straightforward way to fix this. Hopefully, it will work with torch_sparse.SparseTensor, if I can somehow deal with the bug above.

rusty1s commented 4 years ago

Hi and thanks for this issue. Using sparse node features is an interesting idea but it currently isn't officially supported in PyG, and most GNN operators require to operate on dense feature representations.

Furthermore, I do not think that using sparse identity matrices as input features does help in reducing memory complexity of your model since your weight matrices in the first GNN will have a memory complexity of O(N) nonetheless.

An alternative solution is to simply use random node features of low-dimensionality, e.g.,

data.x = torch.randn(data.num_nodes num_features)

or to use a trainable embedding layer, e.g.:

data.n_id = torch.arange(data.num_nodes)
loader = GraphSAINT(...)

embedding = torch.nn.Embedding(data.num_nodes, num_features)

for data in loader:
   x = embedding(data.n_id)
pavlin-policar commented 4 years ago

I see. I was under the impression that using an identity matrix was standard practice when nodes don't have features.

Furthermore, I do not think that using sparse identity matrices as input features does help in reducing memory complexity of your model since your weight matrices in the first GNN will have a memory complexity of O(N) nonetheless.

I don't know how it would affect the model, but it would certainly help in general, right? If I have a graph of e.g. 100k nodes, forming a dense 100k by 100k matrix is really expensive, so I can run out of memory (not GPU memory) before we even get to modeling. On the other hand, if I formed a sparse 100k by 100k matrix, that's just 300k numbers, which is easy to handle.

I suppose a very easy workaround is to just convert the sparse chunks to dense matrices before passing them to the model, after graphSAINT generates smaller graphs. But that still requires graphSAINT to support sparse matrices.

An alternative solution is to simply use random node features of low-dimensionality, e.g.,

That's interesting. Is this more standard than using the identity matrices? I am fairly new to the field, so I am still trying to figure out what is ok and what isn't.

or to use a trainable embedding layer

I am doing link prediction in the hopes of getting good node embeddings, and it seems to work well. But I take these embeddings at the last layer of the auto-encoder, before they are passed into the decoder. Do you think getting embeddings in this way would be a better way?

Thanks for the quick response!

rusty1s commented 4 years ago

Yes, it is standard practice when nodes do not have features (at least for transductive learning), but it's actually equal to learning an embedding matrix. Here, you would still use the last layer to perform link prediction - it's just that an embedding matrix is equal to performing I @ weight.

Using random node features isn't so common, but it is sufficient to learn structural features.

I see what I can do to support sparse matrices for Data though :)

pavlin-policar commented 4 years ago

Yeah, I don't know how not having node-features would work in an inductive learning scenario.

it's just that an embedding matrix is equal to performing I @ weight

That makes a lot of sense. So technically, we're adding a linear transformation to the identity matrix before passing it through any convolutions. This does increase the number of parameters I guess, but should also increase model capacity?

Using random node features isn't so common, but it is sufficient to learn structural features.

I tried this, and it seems to be working somewhat well.

Thanks a bunch, you've been very helpful!