jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.78k stars 1.97k forks source link

what does n_head, d_model, d_k, d_v stands for? #162

Closed seyeeet closed 3 years ago

foreverlms commented 3 years ago

n_head : split the query into n_head parts to do multiple self-attention d_model: embeddings vector size d_k, d_v : key and values vector size, same as query size.