Questions about Cellular Attention Mechanism/Network

pyt-team / TopoModelX

Topological Deep Learning

MIT License

237 stars 83 forks source link

I have a couple of clarifying questions about cellular attention networks and the code for the attention mechanism in the conv and message passing files.

I may be mistaken, but it seems like there is no normalization being done for the attention coefficients in the current implementation of attention (the reference paper uses softmax for this purpose). Should we leave the current attention mechanism as is or is it worth rewriting the code to implement the normalization?
With regards to the tensor diagram for CAN's when it comes to the neighborhood aggregation the tensor diagram here calls for applying a non-linearity to each within neighborhood aggregation and then performing the inter-neighborhood aggregation, whereas the referenced CAN paper performs the inter-neighborhood aggregation and then applies the non-linearity. Would it be ok to go with the formula given by the paper? This would also make it so our implementation can reduce to the Hodge Laplacian layer in the referenced Rodenberry et al. paper when the option to use attention is set to false.

I have a couple of clarifying questions about cellular attention networks and the code for the attention mechanism in the conv and message passing files.

I may be mistaken, but it seems like there is no normalization being done for the attention coefficients in the current implementation of attention (the reference paper uses softmax for this purpose). Should we leave the current attention mechanism as is or is it worth rewriting the code to implement the normalization?

With regards to the tensor diagram for CAN's when it comes to the neighborhood aggregation the tensor diagram here calls for applying a non-linearity to each within neighborhood aggregation and then performing the inter-neighborhood aggregation, whereas the referenced CAN paper performs the inter-neighborhood aggregation and then applies the non-linearity. Would it be ok to go with the formula given by the paper? This would also make it so our implementation can reduce to the Hodge Laplacian layer in the referenced Rodenberry et al. paper when the option to use attention is set to false.

hi @AbrahamRabinowitz

(1) I would suggest you normalize as done by the original. (2) Also the same, please go by what the original paper suggests. We will update the tensor diagram accordingly later. @mathildepapillon

pyt-team / TopoModelX

Questions about Cellular Attention Mechanism/Network #109