jacobeisenstein / gt-nlp-class

Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"
4.94k stars 1.08k forks source link

question #36

Closed raynerr closed 6 years ago

raynerr commented 6 years ago

Hi, thank you for your great book. I am confused about the subscript in formula 3.29, which I think the zk should be zj (sorry for the uncorrect typing shape). Thanks.

jacobeisenstein commented 6 years ago

I'm glad you're enjoying the book!

I could be missing something, but I think it is correct. The parameter $\theta_{k,j}$ is the weight from $z_k$ to $y = j$. Thus, the gradient depends on $z_k$, not $z_j$.

On Mon, Aug 6, 2018 at 7:45 PM, Rayner Rui notifications@github.com wrote:

Hi, thank you for your great book. I am confused about the subscript in formula 3.29, which I think the zk should be zj (sorry for the uncorrect typing shape). Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jacobeisenstein/gt-nlp-class/issues/36, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3QHJIxI2rdRvxHss3qeAgNzNYGFeDaks5uOP9AgaJpZM4VxZAW .

raynerr commented 6 years ago

Thank you for your reply. In my opinion, according to the previous context, column k should represent y = k and j might represent the element in $z_k$ as illustrated in the picture I uploaded. Or, if $ j $ represents $y =j$ and $ k $ should represent the element in $z_j$, which means that the text " $theta_k(z-->y)$ is column $ k $ of ..." in left notation [1487] may be not reasonable. Sorry to disturb you again.

illustration

jacobeisenstein commented 6 years ago

Sorry, I still don't see it. The size of the matrix \Theta^{(zy)} is K_y rows and K_z columns, as required by the matrix-vector product \Theta^{(zy)} z. One thing that may be adding confusion is that k indexes an input feature in theta^{(xz)} [3.26], and a hidden layer element in \theta^{(zy)} [3.25]. I've changed the notation to be more consistent on this point: n indexes input features, k indexes elements in the hidden layer, and j indexes labels.

On Tue, Aug 7, 2018 at 7:53 PM, Rayner Rui notifications@github.com wrote:

Thank you for your reply. In my opinion, according to the previous context, column k should represent y = k and j might represent the element in $z_k$ as illustrated in the picture I uploaded. Or, if $ j $ represents $y =j$ and $ k $ should represent the element in $z_j$, which means that the text " $theta_k(z-->y)$ is column $ k $ of ..." in left notation [1487] may be not reasonable. Sorry to disturb you again.

[image: illustration] https://user-images.githubusercontent.com/37948688/43813644-2442686c-9af9-11e8-8b9a-4e793bd5fc0d.jpg

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jacobeisenstein/gt-nlp-class/issues/36#issuecomment-411267366, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3QHPsD1lRo74YzQv3qhzO5-eUiHR6Eks5uOlLAgaJpZM4VxZAW .

raynerr commented 6 years ago

Oh, I got it. So sorry about my mistake. Thanks a lot, Mr Eisenstein.

jacobeisenstein commented 6 years ago

No problem! -J

On Sun, Aug 12, 2018 at 10:14 PM Rayner Rui notifications@github.com wrote:

Oh, I got it. So sorry about my mistake. Thanks a lot, Mr Eisenstein.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/jacobeisenstein/gt-nlp-class/issues/36#issuecomment-412410316, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3QHPvEpyVu5XUTlQwsH1QJKh1m0CT1ks5uQQsagaJpZM4VxZAW .