Closed seyeeet closed 3 years ago
n_head : split the query into n_head parts to do multiple self-attention d_model: embeddings vector size d_k, d_v : key and values vector size, same as query size.
n_head : split the query into n_head parts to do multiple self-attention d_model: embeddings vector size d_k, d_v : key and values vector size, same as query size.