j-adamczyk / graph_representation_learning_workshops

"Graph Representation Learning and Graph Classification" workshops, ML in PL 2023 conference
8 stars 2 forks source link

Graph Feature Extraction on Unseen data #1

Open hrampadarath opened 5 months ago

hrampadarath commented 5 months ago

Thank you for sharing these very informative notebooks. 01_graph_feature_extraction.ipynb is of great interest to me and it is very well written and presented. How would one go about applying these steps especially the feature extraction to unseen data? Would the unseen data be added to the training graph and the new features for only the unseen data be extracted? Thanks for your consideration.

j-adamczyk commented 5 months ago

@hrampadarath thanks for interest in those notebooks. Graph feature extraction is actually the simplest one to run on unseen data, as this is basically:

  1. Take a new graph
  2. Compute feature vector
  3. Put it in any tabular learning algorithm

So basically you just have one more step, compared to typical Scikit-learn inference - you need to compute features for a new graph. Note that LDP and LTP features are completely stateless, i.e. they only require new graph's structure, and they are not "trained" in any way. Once you have feature extraction function, just put a new graph in it, get the resulting feature vector, and run the classification / regression / any other algorithm for tabular data.

Also remember that those methods are for graph classification, i.e. training data consists of a list of independent graphs, that gets vectorized, and results in a matrix of shape (n_graphs, n_features).

I hope this helps.

hrampadarath commented 5 months ago

@j-adamczyk thanks. I'm trying to understand the dataset (IMDB-BINARY) used in the notebook, so I can represent my own data in a similar format. The dataset contains 1000 independent graphs, with each graph representing a movie and the actors in it? For example below? I also assume that the labels are for each independent graph indicating whether the movie each graph represents are either Action or Romance?

image

So in tabular format the above simple example would be:

image

Many Thanks.

j-adamczyk commented 4 months ago

@hrampadarath yes, exactly. For datasets, I simply used PyTorch Geometric, I highly suggest using its documentation and tutorials to create your own datasets.