microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
356 stars 31 forks source link

Proof of Equation 2 #121

Closed kiucho closed 5 months ago

kiucho commented 5 months ago

Thank you for sharing your research.

While reading your paper, I had some questions and would like to clarify some points. Specifically, in Appendix A.1 (Proof of Equation 2), it is stated that $||Qx||$ is equivalent to $||x||$.

  1. In the paragraph, it is mentioned that

    "The RMSNorm operation divides each row of the input matrix $X$ by its norm."

so, I think the text should explain that $||xQ||$ is equivalent to $||x||$, not $||Qx||$ when $x$ is a row of $X$.

  1. "By the basic rules of linear algebra, if $x$ is a row of $X$, then $Q^⊤x$ is the corresponding row of $XQ$"

is difficult to follow because it refers to the row of $XQ$ as $Q^Tx$ instead of $xQ$. I know if we see $x$ as a column vector, the row of $XQ$ can be $Q^Tx$. However, as we regard $x$ as a row of $X$, it seems more appropriate to use $xQ$.

  1. "Applying RMSNorm to $XQ$, said row will now be equal to $\frac{1}{||x||}Q^Tx$. After RMSnorm, we can multiply by $Q^T$, our row is now equal to $\frac{1}{||x||}QQ^Tx$ = $\frac{1}{||x||}x$."

After applying RMSnorm, if $\frac{1}{||x||}Q^Tx$ is multiplied by $Q^T$, think the row will be equal to $\frac{1}{||x||}Q^TQ^Tx$, not $\frac{1}{||x||}QQ^Tx$.

Thank you for your reply.

kiucho commented 5 months ago

I think I misunderstood it.