GNNs with Missing Node Features 🚀

pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch

https://pyg.org

MIT License

21.48k stars 3.68k forks source link

GNNs with Missing Node Features 🚀 #4176

Closed rusty1s closed 2 years ago

rusty1s commented 2 years ago

🚀 The feature, motivation and pitch

Paper: On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features Code: zip

The paper presents simple, fast and scalable approach for handling missing features in graph machine learning applications. It might be very interesting to add the approach of feature propagation and an example of it to PyG.

@wsad1 Let me know if you have interest in looking into this.

Alternatives

No response

Additional context

No response

wsad1 commented 2 years ago

Sure @rusty1s, I'll check this out and come up with a short plan.

wsad1 commented 2 years ago

Some thoughts

Add FeaturePropagation to torch_geometric.nn.models.
Instead of resetting know features every iteration, we could remove incoming edges to nodes with features from nodes without features, before starting the propagation. This would be more efficient than resetting.

rusty1s commented 2 years ago

Since this "model" has no parameters, I'm not sure we want to put it into model. How about providing it as a transform? Would that work as well? Also note that the code just got released: https://github.com/twitter-research/feature-propagation

emalgorithm commented 2 years ago

Hi guys,

Thanks a lot for the interest in my paper!

I also think the method would be more suited as a transform.

I think resetting the features is very efficient, but if you want to try with removing edges, then you would need to remove all incoming edges to nodes with features (both the ones from nodes with and without features), and only leaving self loops. It would actually be interesting to see how the two implementations differ in terms of efficiency.

rusty1s commented 2 years ago

Thanks for chiming in @emalgorithm.

I think resetting features is easier to implement. It should also be more efficient if we utilize SparseTensor matrix-multiplication rather than scatter/gather (but this is just a guess). The edge removal proposal also doesn't work if features of a node are only partially missing, right?

emalgorithm commented 2 years ago

Yes actually, that's true, the edge removal would work only when a node either has all features or none, but not in the general case of some features missing. Standard diffusion and then resetting is definitely the cleanest solution then.

wsad1 commented 2 years ago

Yes it does fit into transform better.

For the general case where node features are partially missing the user would input a mask of shape num_nodes*num_features, which we would use while resetting the features, right? But at the same time we might need to let the user pass a mask of shape num_nodes when a node has all features or none.

rusty1s commented 2 years ago

That is correct. I think the second case is automatically supported via broadcasting in case your mask is stored as [num_nodes, 1].

EdisonLeeeee commented 2 years ago

Is anyone working on this? If not I can take it.

rusty1s commented 2 years ago

This would be awesome :)

Praniyendeev commented 1 year ago

I was looking at the source code, doesnt seem like it is implemented for HeteroData right?

rusty1s commented 1 year ago

I don't think this feature is applicable to GNNs since it relies on aggregating nearby features of the same node type. This would only be possible if you have an edge type specified that points to itself (same source and destination node type).