Closed bentaculum closed 5 years ago
Yes, you are right. Those are two different things. Sorry for the inconvenience. Will fix it ASAP.
Hi, I just pushed a new GMMConv
to master which fixes this bug. In addition, I added a root_weight
argument to the operator to align better with other operators.
Hi again, thanks for getting to this so quickly :) I checked your update and have a few comments:
gaussian.sum(dim=0)
at a breakpoint on line 100 in gmm_conv.py
, which returns a tensor where all entries (edit: almost all, there are apparently some numerical issues) are the same, namely the number of nodes in the batch. Hi, Why shouldn‘t this make any sense? In the end, we want to handle variable sized neighborhoods, and this can only be achieved by normalization per node.
As far as I understand, I would choose the mean
function as an aggregation function per node if I wanted to treat variable-sized neighborhoods the same way. As of now, I think you normalize all the weights per node in the message, but then sum up (edit) the dotproducts afterwards with add
aggregation in the propagate.
I don't see this normalization in the MoNet paper. Again, I see the point of handling variable-sized neighborhoods the same, but for some other applications you might want the exact opposite, e.g. counting direct neighbors (or nodes n hops away) within a certain euclidian distance.
So basically my question is, why do we need this "pre-normalization of the weights" per edge? I would say that the normalization by # of edges can happen in the aggregation if desired, or not at all in case I want to use sum aggregation instead.
Ok, I agree. I changed GMMConv
once again. You can now pass the wished aggr
as an argument ("add"
by default). This should now more align with its formulation in the paper. WDYT?
Looks good to me :) I took a quick look at the implementation you reference and didn't really get why they are doing the normalization there. Do you understand it?
This implementation is the reason I was doing it in the first place. But after some thoughts I must agree with you. This does not make any sense at all. However, it ensures that features will get activated, which might help training. In the end, this operator does not train well at all. I am not sure why. I guess it might be due to numerical instabilities or wrong initializations of the gaussians. There are many operators which are just way easier to train.
That's what I've found as well, especially the means of the Gaussians have to be initialized in the right spots, otherwise the gradient descent will quickly "push" them out of the feasible region of the pseudo-coordinates. I tried switching to your SplineConv, which I considered much more stable, but encountered instable training as well. I don't understand what could go wrong with the splines, did you encounter some problems due to e.g. initialization as well at some point?
What do you mean with instable training? High fluctations or a single random drop in accuracy? I do noticed the latter from time to time, although I am absolutely unsure what causes this issue. For training SplineCNN, its all about using the right normalized pseudo-coordinates due to uniform distribution of knot positions. That is exactly what the KPConv is trying to fix.
I have built a simple, yet different from most GNN benchmarks, toy task for testing different Graph Neural Net models. When using only a single graph convolution, I expect the learned kernel to be some sort of thresholding function on the pseudo-coordinates, then the sum over the neighborhood can do the counting.
I was able to achieve accuracy of 85% with MoNet (on ~6-10 classes), but this was very sensitive to the initialization of the Gaussians and certain hyperparams. I replaced MoNet with SplineConv, hoping it would be less prone to these problems, but could not even replicate the results I obtained with MoNet, being stuck at roughly 40% of the accuracy. I don't understand how the SplineConv could be less powerful than the MoNet.
This sounds strange. In practice, accuracy should be close to 100% on this simple toy task. Which pseudo-coordinates do you use? Distance? What is the accuracy on the training set?
Yes, I completely agree, both train and test accuracy should be close to 100%. Concerning your questions from above:
Have you been able to successfully tackle tasks with GNN that heavily rely on the pseudo-coordinates?
You should check out transforms like Distance
or Polar
, because Cartesian
might not be the best choice for this task given the fact that it is only able to threshold a square, not a circle (due to uniform knot positions). Our SplineCNN experiments mostly solely rely on pseudo-coordinates, e.g., in the FAUST experiments, there is no other information than the spatial relation of neighboring points.
🐛 Bug
In the recently updated version of GMMConv, it inherits from its parent class MessagePassing and is initialized with mean aggregation.
Expected behavior
To my understanding, the aggregation function should be
add
, as the aggregation is summing up over the neighboring vertices, not normalizing the gaussian mixture components as stated in a comment in the code.