Hi, question about Eq.(9) in paper.

scaomath / galerkin-transformer

[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax for Partial Differential Equations

MIT License

223 stars 28 forks source link

Hi, question about Eq.(9) in paper. #11

Closed jczhang02 closed 8 months ago

jczhang02 commented 8 months ago

I want to ask a question about raw paper, that is: How to derive the equation below from scaled dot-product?

scaomath commented 8 months ago

@jczhang02 Hey JC do you still need help? This formula is a simplified take of the $(QK^T)V$, if one assumes that:

Each row of $Q, K, V$ corresponds to a position $x_i$ in a "discretization", as such, the $j$-th row of $Q$ is equal to $\vec{q}(x_i)$ where $\vec{q}(\cdot)$ is a vector-valued feature map.

Denote $\kappa(x_i, \xi_j):= \vec{q}(x_i)\cdot \vec{k}(x_j)$, this matrix is an evaluation of the Green's function, or kernel (it characterizes how two "points" interact)

For the discretization, think ViT's patch location corresponds to a vertex in a two-dimensional uniform grid.
Once you make these assumptions, the rest is linear algebra plus some simplications in the learnable projection matrices.

jczhang02 commented 8 months ago

Hi, @scaomath. I can not figure out how the first item in Eq.(9) appears before I understand the meaning of "skip-connection". So, Eq.(9) is the simplified take of $(QK^T)V$ with residuals?

jczhang02 commented 8 months ago

Besides, I think the job is pretty awesome! I haven't seen such awesome ideas in the topic of operator learning and your very well-maintained codebase.

As you know, AI4Science is still a niche research topic, limited by the fact that almost no one is very good at both partial differential equations and deep learning. I'm also often torn between continuing my research on this topic because I can't find friends to discuss and learn with.

If so, can I get your contact info? For example, email, telegram, wechat, and so on. You can send it via email (my address: jczhang@live.it).

scaomath commented 8 months ago

Hi, @scaomath. I can not figure out how the first item in Eq.(9) appears before I understand the meaning of "skip-connection". So, Eq.(9) is the simplified take of (QKT)V with residuals?

You can view this as a special and simplified case of $Z \gets V + (QK^T)V$ (plus other operations), while equation (9) is normally how integral equation is written.

scaomath commented 8 months ago

Besides, I think the job is pretty awesome! I haven't seen such awesome ideas in the topic of operator learning and your very well-maintained codebase.

As you know, AI4Science is still a niche research topic, limited by the fact that almost no one is very good at both partial differential equations and deep learning. I'm also often torn between continuing my research on this topic because I can't find friends to discuss and learn with.

If so, can I get your contact info? For example, email, telegram, wechat, and so on. You can send it via email (my address: jczhang@live.it).

My email is scao@umkc.edu.

Below are some more recent developments with PDE operator learning using Transformers: https://openreview.net/forum?id=EPPqt3uERT https://arxiv.org/abs/2302.14376 https://www.sciencedirect.com/science/article/abs/pii/S0021999124001931