phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
https://uvadlc-notebooks.readthedocs.io/en/latest/
MIT License
2.59k stars 590 forks source link

Possible typo in Tutorial 6 #65

Closed insdout closed 1 year ago

insdout commented 1 year ago

Tutorial: 6

Describe the bug It seems that there is a typo in Milti-Head markdown cell:

We refer to this as Multi-Head Attention layer with the learnable parameters $W_{1...h}^{Q}\in\mathbb{R}^{D\times dk}$, $W{1...h}^{K}\in\mathbb{R}^{D\times dk}$, $W{1...h}^{V}\in\mathbb{R}^{D\times d_v}$, and $W^{O}\in\mathbb{R}^{h\cdot dk\times d{out}}$ ($D$ being the input dimensionality). Expressed in a computational graph, we can visualize it as below (figure credit - Vaswani et al., 2017).

Here instead of $W^{O}\in\mathbb{R}^{h\cdot dk\times d{out}}$, it probably should say $W^{O}\in\mathbb{R}^{h\cdot dv\times d{out}}$

As the output is stacked V vectors of d_v dimensions.

Screenshots If applicable, add screenshots to help explain your problem. The screenshot from original paper: Screenshot from 2022-12-25 01-31-38

phlippe commented 1 year ago

Hi, thanks for pointing this typo out! This should be fixed with the newest commit ffea03ed82022579ed66e18e6748160d12ad1115.