How can a GNN be trained?

iamgiddyaboutgit commented 1 year ago

To recap:

Each individual has their own graph (with the same nodes, edges, weights, etc.).
The data for an individual for a node is a (m x p) matrix.
The collection of all data for all n individuals for a node is a tensor with dimensions m x p x n.

iamgiddyaboutgit commented 1 year ago

Dr. Luo: Yes, PyG supports node features with a single dimension, so you need to reduce your m by p matrix to a vector.

The most straightforward way is to reshape the (m, p) matrix for each node to a vector with mp values. If mp is not too large, you can do this.

The second option is to do attention pooling for the (m, p) matrix, and reduce it to a vector with m values, or a vector with p values. It depends on the actual meaning of the matrix to decide whether to reduce along the m dimension or the p dimension. You can use the torch.nn.MultiheadAttention layer to do this attention pooling.

Those are just two general options for doing this. By considering the particular biological meaning of the m x p matrix, there might be other options. For example, if the matrix is an observation-by-variable matrix, I think it also makes sense to do an average pooling, i.e., averaging along the observation dimension to get an averaged vector. But again, this really depends on the biological meaning of the matrix.

Also think about whether average pooling makes sense for your question, e.g., is the average related to the target you want to predict? Will the sum-pooling be more related and informative?

The concatenation and attention-pooling approaches are also worth a try.

iamgiddyaboutgit commented 1 year ago

My vote is for attention or sum-pooling.

jcoffsky3 commented 1 year ago

I did some research on the different methods and it looks like attention pooling could be a good option. Even though it would be a lot of columns, I think we could also consider his first idea of giving each node an mxp vector

iamgiddyaboutgit commented 1 year ago

How to work with our graph labels: https://stackoverflow.com/a/70760862

iamgiddyaboutgit / gnn_for_diabetes

How can a GNN be trained? #16