Is there a detailed process for Zu derivation in the paper 3.4?

wujcan / SGL-TensorFlow

173 stars 43 forks source link

Is there a detailed process for Zu derivation in the paper 3.4? #7

Closed Yjh-Rking closed 3 years ago

Yjh-Rking commented 3 years ago

The derivation is a bit difficult for me. Can you post me the specific process thanks！

wujcan commented 3 years ago

The derivation is a bit difficult for me. Can you post me the specific process thanks！

We have updated the manuscript in the arXiv (https://arxiv.org/abs/2010.10783), please refer to the appendix of the latest version for the detailed derivation.

Yjh-Rking commented 3 years ago

The derivation is a bit difficult for me. Can you post me the specific process thanks！

We have updated the manuscript in the arXiv (https://arxiv.org/abs/2010.10783), please refer to the appendix of the latest version for the detailed derivation.

Sorry for disturbing you. What I mean is where can I refer to the detailed derivation process of this step.

wujcan commented 3 years ago

The derivation is a bit difficult for me. Can you post me the specific process thanks！

We have updated the manuscript in the arXiv (https://arxiv.org/abs/2010.10783), please refer to the appendix of the latest version for the detailed derivation.

Sorry for disturbing you. What I mean is where can I refer to the detailed derivation process of this step.

I know what you mean. The replaced version in arXiv is scheduled to be announced at Mon, 21 Jun 2021. Please wait for the update.

hotchilipowder commented 3 years ago

Hello, I am wondering the equation 13 in the latest version (It adds the transpose T). Should it be a column vector due to z_u is the column vector ?

wujcan commented 3 years ago

Hello, I am wondering the equation 13 in the latest version (It adds the transpose T). Should it be a column vector due to z_u is the column vector ?

There are two kinds of layout notation in matrix calculus. What I use here is the numerator layout notation.

When computing the gradient of a scalar z wrt. a vector x (vector x -> vector y -> scalar z), the chain rule in numerator layout notation is same as that in scalar by scalar, ie., z-by-x=z-by-y dot y-by-x. While in denominator layout, there is a transpose operation in gradient y wrt x, ie., z-by-x=(y-by-x)^T dot z-by-y.

Hope this helps.