Closed AlexDuvalinho closed 6 months ago
Hi Alex, thank you for your interest in the work.
embedding layer
smooth cutoff
normalization
intermediate node embedding
equivariant message
This is a good point, we could have made this more clear in the paper. In writing s_ij^2 * r_ij we implicitly match up their dimensions. These are the implicit steps before multiplying:
The added dimensions are expanded by simply repeating the tensor along the new dimension (i.e. a broadcasting operation as used in Numpy and PyTorch). See this part of the code for further details. Note that the shapes I listed above are meant for illustrative purposes, the tensors in the code are shaped differently as we are following pytorch-geometric's way of representing graphs. Regarding the best way of combining 3D vectors with 1D features I don't think our current architecture nails this part, there definitely is room for improvement. This paper provides some more details about equivariance and how to incorporate directional information using spherical harmonics, which is computationally more expensive.
scalar product
Hope this helps, feel free to ask if you have further questions.
Hello, after reading the paper, I had several questions regarding your approach. Thanks a lot in advance for taking the time to answer them.
Your embedding layer is more complex than usual: your initial node representation already seems to depend on its neighbour’s representation.
Graph construction: you use a smooth cutoff function and describe some benefits. You describe a Transformers but still use a cutoff value.
You say the feature vector are passed through a normalization layer.
An intermediate node embedding (y_i) utilising attention scores is created and impact final x_i and vi embeddings. This step weights a projection of each neighbor’s representation ~ $a{ij} (W \cdot RBF(d_{ij}) \cdot \vec{V}_j)$ by the attention score.
The equivariant message m_ij (component of sum to obtain w_i) is obtained by multiplying s_ij^2 (i.e. v_j scaled by RBF(d_ij)) by the directional info r_ij; then adding to it s_ij^1 (i.e. v_j scaled by RBF(d_ij)) re-multiplied by v_j.
Do you think that multiplying the message sequentially by distance info and directional info is the best choice to embed both info. type ? Why not concatenate r_ij (r_i - r_j) and d_ij (norm of r_ij = distance) info and have a single operation for instance ?
Is multiplying s_ij^1 by v_j (again) necessary ? (first in s_ij then by multiplying element-wise s_ij to v_j)
IMPORTANT. r_ij has dimension 3 while s_ij^2 has dimension F. In Eq (11), how can you apply an element-wise multiplication ? Is it a typo ? How exactly do you combine these two quantities ? What’s your take on the best way to combine 3D info (directional vector) with existing embedding ? This is a true question I am interested in, if you have references or insights on this bit…
Invariant representation involves the scalar product of the equivariant vector v_i, projected with matrix U1 by (U2 v_i).