Possible typo in Tutorial 6

Tutorial: 6

Describe the bug It seems that there is a typo in Milti-Head markdown cell:

We refer to this as Multi-Head Attention layer with the learnable parameters $W_{1...h}^{Q}\in\mathbb{R}^{D\times dk}$, $W{1...h}^{K}\in\mathbb{R}^{D\times dk}$, $W{1...h}^{V}\in\mathbb{R}^{D\times d_v}$, and $W^{O}\in\mathbb{R}^{h\cdot dk\times d{out}}$ ($D$ being the input dimensionality). Expressed in a computational graph, we can visualize it as below (figure credit - Vaswani et al., 2017).

Here instead of $W^{O}\in\mathbb{R}^{h\cdot dk\times d{out}}$, it probably should say $W^{O}\in\mathbb{R}^{h\cdot dv\times d{out}}$

As the output is stacked V vectors of d_v dimensions.

Screenshots If applicable, add screenshots to help explain your problem. The screenshot from original paper: Screenshot from 2022-12-25 01-31-38

phlippe / uvadlc_notebooks

Possible typo in Tutorial 6 #65