iamgiddyaboutgit / gnn_for_diabetes

0 stars 0 forks source link

How can a GNN be trained? #16

Open iamgiddyaboutgit opened 1 year ago

iamgiddyaboutgit commented 1 year ago

To recap:

iamgiddyaboutgit commented 1 year ago

Dr. Luo: Yes, PyG supports node features with a single dimension, so you need to reduce your m by p matrix to a vector.

The most straightforward way is to reshape the (m, p) matrix for each node to a vector with mp values. If mp is not too large, you can do this.

The second option is to do attention pooling for the (m, p) matrix, and reduce it to a vector with m values, or a vector with p values. It depends on the actual meaning of the matrix to decide whether to reduce along the m dimension or the p dimension. You can use the torch.nn.MultiheadAttention layer to do this attention pooling.

Those are just two general options for doing this. By considering the particular biological meaning of the m x p matrix, there might be other options. For example, if the matrix is an observation-by-variable matrix, I think it also makes sense to do an average pooling, i.e., averaging along the observation dimension to get an averaged vector. But again, this really depends on the biological meaning of the matrix.

Also think about whether average pooling makes sense for your question, e.g., is the average related to the target you want to predict? Will the sum-pooling be more related and informative?

The concatenation and attention-pooling approaches are also worth a try.

iamgiddyaboutgit commented 1 year ago

My vote is for attention or sum-pooling.

jcoffsky3 commented 1 year ago

I did some research on the different methods and it looks like attention pooling could be a good option. Even though it would be a lot of columns, I think we could also consider his first idea of giving each node an mxp vector

iamgiddyaboutgit commented 1 year ago

How to work with our graph labels: https://stackoverflow.com/a/70760862