williamleif / GraphSAGE

Representation learning on large graphs using stochastic graph convolutions.
Other
3.43k stars 844 forks source link

Unseen Node classification #189

Open riyaj8888 opened 2 years ago

riyaj8888 commented 2 years ago

Hi is my understanding correct?

during inference on new node do we have to build a new graph with this new node.?

or how to do testing on new node which added?

sam-lev commented 2 years ago

Will you elaborate on your problem setting? If (a) you have a pre-existing graph which you trained on, and a new node is added to that graph, then no new graph is needed, and inference can be 'extended' onto the unseen (new) node. One way to do this would be to have the new node already part of the graph before training but labeled 'test'. Then during the test phase inference will be performed on the new node. If (b) you want a new node within a new graph, then a new adjacency matrix will need to be built to perform inference after training on the initial graph.

riyaj8888 commented 2 years ago

Will you elaborate on your problem setting? If (a) you have a pre-existing graph which you trained on, and a new node is added to that graph, then no new graph is needed, and inference can be 'extended' onto the unseen (new) node. One way to do this would be to have the new node already part of the graph before training but labeled 'test'. Then during the test phase inference will be performed on the new node. If (b) you want a new node within a new graph, then a new adjacency matrix will need to be built to perform inference after training on the initial graph.

i have pre-existing graph and new node is added to that graph only , so u mean to say that i have to retrain the graph again just to get the prediction for this new node .

can u elaborate a little how can i do inductive inference on this new node without retraining? do i need to create graph every time new node being added to graph just for inference ?

sam-lev commented 2 years ago

If the node is connected to the graph but labeled 'test' using GraphSage's 'test', train, val' scheme, then the node will be inferred after training. For this look at the method 'incremental_evaluate' and similarly how the test nodes are used by the 'node minibatch' class when the session run is called in incremental evaluate.

In the same vein, you could also save the session after training, load the trained state graph, and infer on unseen test nodes. Both the previous approach and this would be inductive.

If you plan to add nodes online after training, meaning you don't know at the time of training what nodes you will be inferring on and can't add them to the graph as test nodes to only be used for inductively inferring over after training, then you may want to look into re-building the adjacency matrix when you plan to use the trained gnn. For this look again to the NodeMiniBatchIterator class and specifically the 'construct_adj' method. You can then train your gnn over the graph (and associated adjacency structure encoded in its minibatch.adj) then when you hope to infer use the trained gnn (or load the trained gnn's state) and update your 'minibatch.adj' to take into account any new nodes. Once you have your updated minibatch.adj* you can then perform inference (with similar logic to that shown in 'incremental_evaluate' or however you feel appropriate).

Best, Sam