Open daviduarte opened 3 years ago
In transformer.py, in class MultiHeadedSelfAttention() we have the var declaration:
self.proj_q = nn.Linear(dim, dim) self.proj_k = nn.Linear(dim, dim) self.proj_v = nn.Linear(dim, dim)
but wasn't suposed to be Q, K and V an independent trainable matrix per head? E.g. if num_head = 12, wasn't that suposed to be like:
set = [] for i in range(12): set.append([nn.Linear(dim, dim), nn.Linear(dim, dim), nn.Linear(dim, dim)])
Regards!
In transformer.py, in class MultiHeadedSelfAttention() we have the var declaration:
but wasn't suposed to be Q, K and V an independent trainable matrix per head? E.g. if num_head = 12, wasn't that suposed to be like:
Regards!